Skip to content

carleton-spacehogs/SAGs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAGs

Welcome!

Welcome to the SAGs repo README. This document outlines the contents of folders, files, and documents included in the SAG repo.

As of latest commit, SAGs contains the following folders and files:

  1. bubble_plots: a collection of bubble plots by SAG
  2. dN_dS_scatter_images: scatterplots of dN/dS by ITEP cluster presence/absence
  3. KO_SNV_misc: miscellaneous files for Seq Object data structures: awaiting sorting or deletion.
  4. misc_images: miscellaneous/one-time images
  5. misc_trash: miscellaneous formatting files awaiting deletion.
  6. pa_files: text files related to ITEP/clusterDbanaylsis presence/absence tables.
  7. python_files: scripts, pipelines, custom and classes written in Python 3. Awaiting further organization and documentation.
  8. SAG_data_files: .gff, .ko, .fa assembly files, .tsv contig names (from anvi-script), and .names_map files for each SAG. This is all the data that needs to be integrated for non-ITEP/non-PAML analysis.
  9. variability.zip: zipped anvi'o outputs for SNV and SAAV variability profiles for all 5 SAGs.
  10. papers: the papers I've been reading and pre-existing info on the research

Folder Overviews

bubble_plots

dN_dS_scatter_images

KO_SNV_misc

misc_images

misc_trash

pa_files

python_files

Dependencies

These files are written in Python 3. The following modules are used: \

  • pandas
  • numpy
  • matplotlib (pyplot)
  • sklearn (commented out)
  • scipy
  • pickle

Note: The majority of the python modules and functions in this directory are intended for single-purpose use on specific file types.

SeqObj folder

Contains code for custom Python SNV, SAAV, Contig and ORF objects. Designed to store all of the information in the SAG_data_files folder in Python objects to easily sort, parse and analyze this information. The Python object modules are:

  • SNV,
  • SAAV
  • Contig
  • ORF

View the module files for a list of data attributes stored in these objects.
An array of functions for reading the input files and constructing lists of Sequence objects exist:

  • make_contig_ORF_and_SNV_lists.py
  • make_SNVs_and_Contigs.py
  • makeSequenceObjects.py
  • mergeORFpaData.py
  • organizeKO.py
  • populateContigLists.py

These functions must be called in order to properly initialize the Sequence objects.

SAG_data_files

variability.zip

papers

Contributors

Michael Hoffert - Undergrad, [email protected]
Rika Anderson - PI, Space Hogs

Releases

No releases published

Packages

No packages published

Languages