Skip to content

Latest commit

 

History

History
69 lines (69 loc) · 4.87 KB

ToDo.md

File metadata and controls

69 lines (69 loc) · 4.87 KB
  • Remove all camel casing function names/vars, change to underscores and lowercase.
  • Reorder amino acids alphabetically.
  • Add normalize to moran and geary auto
  • For quasi seq order you should be able to pass in name of distance matrix file with or without .json
  • For each descriptor, check valid amino acids in seq, if not then raise custom error.
  • Add dimensions of each descriptor to readme and docs.
  • Round SOCN to 3 d.p
  • Unit test the data type for each column in all descriptors.
  • Remove .empty tests from unit tests as validating shape of DF will test for emptiness.
  • Add 0 to singualr descriptor columns, e.g polarizability_CTD_C_1 -> polarizability_CTD_C_01
  • Change sec_struct to secondary_struct.
  • Reduce number of tests by iterating over list of protein seqs.
  • Test dtypes of output dataframe -> test_autocorrelation
  • Add shape to comment on testing shape unittests.
  • Mention lag is similar to gap between 2 amino acids.
  • Go through test_quasi file, double checking correct values.
  • Append distance matrix to SOCN & Quasi columns, SW or G.
  • Change quasi sequence order -> sequence_order.
  • Calculate all SOCN, for both matrices, append to single output df.
  • SOCN done, quasi done.
  • Reread descriptor comments and explanations.
  • Change SOCNUm to SOCN.
  • Pseudo AAC has to explicitly use hydrophoobicity, hydrophilicity and side-chain/residue mass values. Can't find corresponding values in aaindex so just hard code them in.
  • If no properties input to pseudo or amp comp funcs then use hydo, hydrophi, residue by default. Accept list of aaindex1 codes, if str input then cast to list.
  • Uppercase sequence on input, remove whitespace.
  • Move references to top of each module.
  • Input property in CTD funcs can be used with closeness function.
  • Double check functions that use aa_composition values, aa_comp func returns series rather than dict.
  • Rather than iterate over range of lags, use different lag in each sequence test.
  • Change max_lag to lag
  • Create demo on Notebook.
  • Add descriptor abbreviations to each functiosn comments, change abbreviations of Pseudo AAComp -> PAAComp.
  • Add references to readme text.
  • In readme, add output of each function below its usage.
  • Add reference numbers to comments in descriptor functions - double check existing ones are correct.
  • Add lag and weight param validation to sequence order module.
  • Change QSOrder to QSO.
  • Rewrite APAAComp descriptor comments to mention its dimensions change with lamda.
  • For all functions that have lag in them: #raise value error if int cant be parsed from input lag try: lag = int(lag) except: raise ValueError("Invalid lag value input, integer cannot be parsed from {}".format(lag))
  • Add logo/image to main readme.
  • Add emojis to readme.
  • Add releases.
  • Change hydrophobicity_CTD_T_13 to CTD_T_13_hydrophobicity.
  • Python unit tests using ctd with 1 property, and using all properties, check dimensions - 21 vs 147 (147/21=7). 21 dimensions per property. 3 C, 3 T, 15 D.
  • Add output dimensions to SOCN functions.
  • def sequence_order_coupling_number() - dimesnion (1,lag). def sequence_order_coupling_number_all() - dimension (1,lag*2)
  • def quasi_sequence_order() - dimesnion (1,lag). def quasi_sequence_order_all() - dimension (1,lag*2)
  • Fix cirlceci and add circleci badge to readme, double check workflow.
  • Add codecov, use pySAR repo as an example.
  • Add references to each descriptor comments.
  • Change all comment underlining from "------" to "=======".
  • In "Parameters' and 'returns' , remove space between colon.
  • Read over code.
  • Create demo
  • Add equations of descriptors to markdown file. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0554-8#Sec10 - https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-300
  • Remove python 3.7 references, 3.8 minimum.
  • In descriptor comments, dimensions of output should be 1 x N rather than N x 1 (N=# of features).
  • Remove biopython from requirments & setup.py - required for testing.
  • https://github.com/gadsbyfly/PyBioMed/blob/master/PyBioMed/doc/Descriptor/PyBioMed%20Protein.pdf
  • Sequence order can accept just schenider-wrede or grantham.
  • Add link to medium article.
  • readthedocs(https://github.com/MartinThoma/propy3/tree/master).
  • https://www.google.com/url?sa=i&url=https%3A%2F%2Fchem.libretexts.org%2FBookshelves%2FOrganic_Chemistry%2FOrganic_Chemistry_%2528OpenStax%2529%2F26%253A_Biomolecules-_Amino_Acids_Peptides_and_Proteins%2F26.10%253A_Protein_Structure&psig=AOvVaw0Qo-k6BzbFLhPNHLlzBkIL&ust=1700267570233000&source=images&cd=vfe&opi=89978449&ved=0CBIQjRxqFwoTCNiBtLbkyYIDFQAAAAAdAAAAABAE
  • Change physiochemical to physicochemical.