Tm/synthetics #22

TimOliverMaier · 2023-03-21T11:18:23Z

This Pull request implements a platform for synthetic data generation in proteolizard.

+ This adds a new class `ProteomicsExperimentSample` to represent sample being pushed through proteomics setup

1. `ProteomicsExperimentSampleSlice` is considered the working bulk of data, that is loaded as dataframe for processing 2. `ProteomicsExperimentDatabaseHandle` is a wrapper class for sql database management

1. python/proteolizardalgo/feature.py: + Class for charge distribution `ChargeProfile` 2. python/proteolizardalgo/hardware_models.py: + `LiquidChromatography` * support for `irt_to_rt` method * methods returning time interval (start,end) and center of frames + implemented `EMGChromatographyProfileModel` + implemented `NormalIonMobilityProfileModel` 3. python/proteolizardalgo/proteome.py + method to make columns with `Profile` data types SQL compatible TODO: + IonMobilityModel must support multiple charge states. + realistic parameter sampling + realistic `irt_to_rt` method -> must be provided by user

1. model params were null in sql table + This was due to np datatypes (not serializable) + now stored as python built-ins 2. charge profile is stored in peptides table

1. changed hyphen - to underscore _ in sql columns 2. In experiment.py, added structure for TOF spectra assembly 3. replaced assertion with in averagine_generator concerning proper masses for averagine model

This adds a prototype end to end synthetics generator that returns a dictionary (frames) of dictionaries (scans) of `MzSpectrum`

orienting synthetics workflow on experiment

renamed `NeuralMobilityApex` to `NeuralIonMobilityApex`

`pep_id` as unique key for every peptide

1. In chemistry.py new class `ChemicalCompound` with subclass `BufferGas` for handling of e.g. ion mobility gas properties. + ChemicalCompound gets elemental properties from new dependency ['mendeleev'](https://github.com/lmmentel/mendeleev) 2. CCS to ion mobility / reduced ion mobility is now handled within device class `IonMobilitySeparation` 3. Scan to ion mobility and reverse are now relying on converters defined by user

1. check for empty spectra 2. use __repr__ for file output instead of json

1. instead of repeatedly adding spectra `.push` just copies data and lets `to_resolution` at the end do the sorting and adding.

1. tf models use a lot of RAM, and a variety of approaches to free the RAM did not work. Running the model inference inside a child process worked.

1. fixed bug in which sequence tokens were read as string

TimOliverMaier added 30 commits February 27, 2023 14:05

orienting synthetics workflow on experiment

83c6136

use properties instead of abstractmethods

7b8263c

ProteomicsExperimentSample class

47e1f2f

+ This adds a new class `ProteomicsExperimentSample` to represent sample being pushed through proteomics setup

skeleton of experiment rt apex step working

c92acde

skeleton for sqlite usage

278525d

1. `ProteomicsExperimentSampleSlice` is considered the working bulk of data, that is loaded as dataframe for processing 2. `ProteomicsExperimentDatabaseHandle` is a wrapper class for sql database management

database handler with chunk iteration

0f24f3a

merge hardware files to prevent circular imports

80eb6a0

restructured hardware model file

b3ae6b6

simulation name depending data insertion

e87a8dc

peptides and ion table in SQL backend

f84192f

fixing model_params being null

bc913bd

1. model params were null in sql table + This was due to np datatypes (not serializable) + now stored as python built-ins 2. charge profile is stored in peptides table

assembly of mz spectra structure

1634d5c

1. changed hyphen - to underscore _ in sql columns 2. In experiment.py, added structure for TOF spectra assembly 3. replaced assertion with in averagine_generator concerning proper masses for averagine model

prototype end-to-end

ede08e2

This adds a prototype end to end synthetics generator that returns a dictionary (frames) of dictionaries (scans) of `MzSpectrum`

Merge branch 'isotope_sampler' into experiment_class

bdfbd79

Merge pull request #15 from TimOliverMaier/experiment_class

3b7930a

orienting synthetics workflow on experiment

fix issue #18

d359d40

renamed `NeuralMobilityApex` to `NeuralIonMobilityApex`

fix issue #16

50156fa

`pep_id` as unique key for every peptide

CCS to reduced mobility via summary constant

b988111

renamed TimsTOFExperiment

3d4d83f

Implementation of MzSeparation device class

277f2cb

abund. and resol. vars of Mz device and model

dda44d7

performance optimization of assembly

4758ecc

write to output file

c3b8355

centroid spectra

bb0ecb5

performance optimizations

daba1f9

1. check for empty spectra 2. use __repr__ for file output instead of json

parallelize assembly

19f3ac4

prevent pool overhead for num_process = 1

b135a16

removed averagine mz warning

eef1ce4

TimOliverMaier added 12 commits March 28, 2023 00:23

json format output

b7d7551

parquet output

802376c

concurrent database access

1d24fd6

push spectra into scan_spectrum instead of add

e0acc31

1. instead of repeatedly adding spectra `.push` just copies data and lets `to_resolution` at the end do the sorting and adding.

RAM usage optimziation

18471ea

1. tf models use a lot of RAM, and a variety of approaches to free the RAM did not work. Running the model inference inside a child process worked.

support proForma aa sequences

07a6b14

sequence tokens read by tokens

9952e7a

1. fixed bug in which sequence tokens were read as string

maxquant to proforma sequence translator

e735f0e

script to extract raw data of identified features

528feb7

raw file column in extraction dataframe

1348177

Binomial Ion source and data based normal profiles

ea970aa

updated constants

b0a9c70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tm/synthetics #22

Tm/synthetics #22

TimOliverMaier commented Mar 21, 2023

Tm/synthetics #22

Are you sure you want to change the base?

Tm/synthetics #22

Conversation

TimOliverMaier commented Mar 21, 2023