Skip to content

Truncated single scaffold .align output #370

@mgalderman

Description

@mgalderman

Hey there! I am working on a project characterizing repeat elements in 9 different vertebrate species across their respective autosomes and sex chromosomes (ZZ / ZW sex chromosomes in my species). I have created a custom TE library using RepeatModeler, and have been using RepeatMasker to annotate repeats in each genome. One of my end goals is to create figures that depict Kimura landscape distributions for the autosomes, the Z and the W chromosome separately for each species. I have run RepeatMasker on the whole genomes, and have parsed the .align outputs using a Python code to separate the autosomes, Z and W specific output to then process using the calcDivergenceFromAlign.pl and createRepeatLandscape.pl scripts for the separated chromosome categories. My problem occurs when I am processing genomes that have a sex chromosome that is made up of a single scaffold (rather than multiple scaffolds). The results from the calcDivergenceFromAlign.pl step results in an output that is missing a lot of information, and gives results for only a couple TE families. I've tried to run RepeatMasker from the start on just the single scaffold separate from the rest of the genome, but have the same problem. I don't have this issue when running the perl scripts on sex chromosomes that are made up of multiple scaffolds. Any ideas why this might be happening?

Below is an example of the commands I am running on the genome of Cerastes gasperettii.

I've also attached an example of the .landscape output for the W chromosome to show what the truncated results look like.

Thanks for the help!
Megan

Autosomes

python parseAlignChrom.py ../chromosome_repeat_density/scaffold_lists/list.c_gasperettii.auto.scaffolds.txt \
./c_gasperettii/5_full_mask/Cgasperettii_ncbi.full_mask.align > ./c_gasperettii/5_full_mask/c_gasperettii.full_mask.auto.align

Chr Z

python parseAlignChrom.py ../chromosome_repeat_density/scaffold_lists/list.c_gasperettii.chrZ.scaffolds.txt \
./c_gasperettii/5_full_mask/Cgasperettii_ncbi.full_mask.align> ./c_gasperettii/5_full_mask/c_gasperettii.full_mask.chrZ.align

Chr W

python parseAlignChrom.py ../chromosome_repeat_density/scaffold_lists/list.c_gasperettii.chrW.scaffolds.txt \
./c_gasperettii/5_full_mask/Cgasperettii_ncbi.full_mask.align > ./c_gasperettii/5_full_mask/c_gasperettii.full_mask.chrW.align

.align and .landscape processing

cd /c_gasperettii/5_full_mask 
 
gzip Cgasperettii_ncbi.full_mask.align

cd ..

~/tmp/repeat-annotation/RepeatMasker/util/calcDivergenceFromAlign.pl -s 5_full_mask/c_gasperettii.full_mask.auto.landscape 5_full_mask/c_gasperettii.full_mask.auto.align
~/tmp/repeat-annotation/RepeatMasker/util/createRepeatLandscape.pl -div 5_full_mask/c_gasperettii.full_mask.auto.landscape -twoBit ./Cgasperettii_ncbi.2bit > 5_full_mask/c_gasperettii.full_mask.auto.landscape.html

~/tmp/repeat-annotation/RepeatMasker/util/calcDivergenceFromAlign.pl -s 5_full_mask/c_gasperettii.full_mask.chrZ.landscape 5_full_mask/c_gasperettii.full_mask.chrZ.align
~/tmp/repeat-annotation/RepeatMasker/util/createRepeatLandscape.pl -div 5_full_mask/c_gasperettii.full_mask.chrZ.landscape -twoBit ./Cgasperettii_ncbi.2bit > 5_full_mask/c_gasperettii.full_mask.chrZ.landscape.html

~/tmp/repeat-annotation/RepeatMasker/util/calcDivergenceFromAlign.pl -s 5_full_mask/c_gasperettii.full_mask.chrW.landscape 5_full_mask/c_gasperettii.full_mask.chrW.align
~/tmp/repeat-annotation/RepeatMasker/util/createRepeatLandscape.pl -div 5_full_mask/c_gasperettii.full_mask.chrW.landscape -twoBit ./Cgasperettii_ncbi.2bit > 5_full_mask/c_gasperettii.full_mask.chrW.landscape.html

tmp_cgasperettii.full_mask.chrW.landscape.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions