Skip to content

Latest commit

 

History

History

assembly_dir_structure

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Suggested Assembly directory structure and script to create it

The asm_dir_struct.sh script can be used to see information about directories that support the assembly. The script can also be used to create the structure for you.

Invoking asm_dir_struct.sh with no arguments shows information about the directory names and functions. Running it with a directory option, usually a dot '.' indicating the current directory, prints a list of commands to create the structure. You can save this to a file and make modifications before running. However what we typically do is pipe it into bash while in the directory where we want to create the assembly.

$ asm_dir_struct.sh . | bash

Here is the default structure. You can always feel free to add directories that you will need and remove those that aren't suitable for this assembly's needs.

.
├── dir_struct.notes
├── __create_file_named_busco.lineage_with_BUSCO_lineage_as_last_line_in_file__
├── reads
│   ├── HiFi
│   │   ├── raw_data
│   │   ├── __can_replace_raw_data_dir_with_softlink_to_dir_where_files_located__
│   │   ├── clean
│   │   └── decontam_reads
│   ├── HiC
│   │   ├── raw_data
│   │   ├── clean
│   │   └── secondary_clean
│   └── RNA_seq
│       ├── raw_data
│       └── clean
├── genome_size_est
├── asm
│   ├── run1
│   └── merge_asms
├── hic_scaffold
│   ├── yahs
│   ├── juicer
│   └── JBAT_post_review_finalization
├── decontam
├── repeatmask
│   ├── repeatmodeler
│   └── repeatmasker
├── quality_assessment
│   ├── BUSCO_links
│   ├── Quast
│   └── Flagger
├── other_genomes
├── synteny
├── mito
├── anno
│   ├── braker
│   ├── functional_anno
│   ├── trna
│   └── rrna
├── ortho_analysis
├── current_best
└── final_files
    ├── fixup
    └── ncbi

As the create_file_named_busco.lineage_with_BUSCO_lineage_as_last_line_in_file file title says you should make a file named busco.lineage in the top assembly directory. Later scripts that invoke BUSCO or its new complement compleasm will be able to find this file and run the programs with this lineage without additional user intervention.

After you create busco.lineage, you can delete the empty file that told you to create it.

If you know the lineage you want to use, say sauropsida, then you can create the file with this as in:

$ echo sauropsida >> busco.lineage

If you have some of the NCBI taxonomy files installed, we will see later how you can use busco_lineage_from_taxid_or_name.sh to get a list of lineages appropriate for the subject taxon.

For example

$ busco_lineage_from_taxid_or_name.sh Scincidae

gives us

# Best lineage written to stdout, everything else to stderr, so can use with $()

# looking for full taxonomy of "Scincidae;"
1273157 | Acontinae | cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Sauropsida; Sauria; Lepidosauria; Squamata; Bifurcata; Unidentata; Scinciformata; Scincidae; |

# matching busco lineages
 eukaryota_odb10 	[255]
     - metazoa_odb10 	[954]
         - vertebrata_odb10 	[3354]
             - tetrapoda_odb10 	[5310]
                 - sauropsida_odb10 	[7480]
# best busco lineage
sauropsida