Overrepresented sequences (polyG: GGGGG...) - No Hit Source persisted even with `--trim_poly_g` #589

denvercal1234GitHub · 2024-12-14T18:09:48Z

Hi there,

Thanks for the tool.

I have some paired-end bulkRNAseq. I ran fastp as below with --trim_poly_g but in the FASTQC report for a sample, there was still shown issue with over-represented sequence of GGGGGGGG... in the R2 read. with "No Hits" in the Source.

Would you mind giving me some pointers to address this issue?

Thank you!

fastp \
    --adapter_fasta /ceph/project/borrowlab/qnguyen/RAW_bulkRNASeq_TMNCTFHTREGCD8_QNN2024Aug19/X204SC24072759-Z01-F001_02/01.RawData/for_Trimming.fasta \
    --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    --adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
    --qualified_quality_phred 5 \
    --unqualified_percent_limit 50 \
    --n_base_limit 15 \
    --overlap_len_require 30 \
    --overlap_diff_limit 1 \
    --overlap_diff_percent_limit 10 \
    --length_required 150 \
    --length_limit 150 \
    --trim_poly_g \
    -i "$file" \
    -I "$r2_file" \
    -o "$output_r1" \
    -O "$output_r2"

My for_Trimming.fasta is pasted below:

>ClontechSMART_1
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTT
>ClontechSMART_2
GCTAATCATTGCAAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTT
>ClontechSMART_3
AGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTT
>ClontechSMART_4
GCTAATCATTGCAAGCAGTGGTATCAACGCAGAGTACTGTTTTTTTTTTT
>ClontechSMART_5
AAGCAGTGGTATCAACGCAGAGTACTGTTTTTTTTTTTTTTTTTTTTTTT
>ClontechSMART_6
GCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTT
>Illumina_TruSeq_Adapter_Read_1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
>Illumina_TruSeq_Adapter_Read_2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

The text was updated successfully, but these errors were encountered:

denvercal1234GitHub mentioned this issue Dec 14, 2024

FastQC still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after --trim_poly_g in fastp. s-andrews/FastQC#146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overrepresented sequences (polyG: GGGGG...) - No Hit Source persisted even with `--trim_poly_g` #589

Overrepresented sequences (polyG: GGGGG...) - No Hit Source persisted even with `--trim_poly_g` #589

denvercal1234GitHub commented Dec 14, 2024 •

edited

Loading

Overrepresented sequences (polyG: GGGGG...) - No Hit Source persisted even with --trim_poly_g #589

Overrepresented sequences (polyG: GGGGG...) - No Hit Source persisted even with --trim_poly_g #589

Comments

denvercal1234GitHub commented Dec 14, 2024 • edited Loading

Overrepresented sequences (polyG: GGGGG...) - No Hit Source persisted even with `--trim_poly_g` #589

Overrepresented sequences (polyG: GGGGG...) - No Hit Source persisted even with `--trim_poly_g` #589

denvercal1234GitHub commented Dec 14, 2024 •

edited

Loading