Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tm/synthetics #22

Draft
wants to merge 42 commits into
base: main
Choose a base branch
from
Draft

Tm/synthetics #22

wants to merge 42 commits into from

Conversation

TimOliverMaier
Copy link
Collaborator

This Pull request implements a platform for synthetic data generation in proteolizard.

+ This adds a new class `ProteomicsExperimentSample`
to represent sample being pushed through proteomics setup
1. `ProteomicsExperimentSampleSlice` is considered
    the working bulk of data, that is loaded as dataframe for
    processing
2. `ProteomicsExperimentDatabaseHandle` is a wrapper class
    for sql database management
1. python/proteolizardalgo/feature.py:
    + Class for charge distribution `ChargeProfile`
2. python/proteolizardalgo/hardware_models.py:
    + `LiquidChromatography`
        * support for `irt_to_rt` method
        * methods returning time interval (start,end)
          and center of frames

    + implemented `EMGChromatographyProfileModel`
    + implemented `NormalIonMobilityProfileModel`
3. python/proteolizardalgo/proteome.py
    + method to make columns with `Profile` data types
      SQL compatible

TODO:
    + IonMobilityModel must support multiple charge states.
    + realistic parameter sampling
    + realistic `irt_to_rt` method -> must be provided by user
1. model params were null in sql table
    + This was due to np datatypes (not serializable)
    + now stored as python built-ins

2. charge profile is stored in peptides table
1. changed hyphen - to underscore _ in sql columns
2. In experiment.py, added structure for TOF spectra assembly
3. replaced assertion with in averagine_generator concerning
    proper masses for averagine model
This adds a prototype end to end synthetics
generator that
returns a dictionary (frames) of dictionaries (scans) of
`MzSpectrum`
orienting synthetics workflow on experiment
renamed `NeuralMobilityApex` to
`NeuralIonMobilityApex`
`pep_id` as unique key for every peptide
1.  In chemistry.py new class `ChemicalCompound`
    with subclass `BufferGas` for handling of
    e.g. ion mobility gas properties.
        + ChemicalCompound gets elemental properties
          from new dependency
          ['mendeleev'](https://github.com/lmmentel/mendeleev)
2.  CCS to ion mobility / reduced ion mobility is now handled
    within device class `IonMobilitySeparation`

3.  Scan to ion mobility and reverse are now relying
    on converters defined by user
1. check for empty spectra
2. use __repr__ for file output instead of json
1. instead of repeatedly adding spectra
    `.push` just copies data and lets `to_resolution`
    at the end do the sorting and adding.
1. tf models use a lot of RAM, and a variety of
    approaches to free the RAM did not work.
    Running the model inference inside a child process
    worked.
1. fixed bug in which sequence tokens were read as
    string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant