Releases: wfondrie/depthcharge
Releases · wfondrie/depthcharge
Depthcharge v0.4.8
Depthcharge v0.4.7
[v0.4.7]
Fixed
- Add stop and start tokens for
AnnotatedSpectrumDataset
, when available. - When
reverse
is used for thePeptideTokenizer
, automatically reverse the decoded peptide.
Depthcharge v0.4.6
[v0.4.6]
Added
- Added support for unsigned modification masses that don't quite conform to the Proforma standard.
Depthcharge v0.4.5
Changed
- The
scan_id
column for parsed spectra is not a sting instead of an integer. This is less space efficient, but we ran into issues with Sciex indexing when trying to use only an integer.
Depthcharge v0.4.4
Changed
- Partially revert length changes to
SpectrumDataset
andAnnotatedSpectrumDataset
. We removed__len__
from both due to problems with PyTorch Lightning compatibility. - Simplify dataset code by removing redundancy with
lance.pytorch.LanceDatset
. - Improved warning message for skipped spectra.
Depthcharge v0.4.3
Changed
- Length of the
SpectrumDataset
andAnnotatedSpectrumDataset
now reflect thesamples
parameter of thelance.pytorch.LanceDataset
parent class.
Depthcharge v0.4.2
Changed
- The length of
SpectrumDataset
andAnnotatedSpectrumDataset
is now the number of batches, not the number of spectra. This let's tools like PyTorch Lighting create their progress bars properly. - Parsing a dataset now no longer requires reading essentially the whole first file. Now the schema is inferred from the first 128 spectra.
Depthcharge v0.4.1
Added
- Significant updates to documentation. Add how to model mass spectra.
- Reading and writing from cloud storage on everything!
Changed
- Migrated to Mike for mkdocs to manage multiple versions.
- Moved test GitHub Action from pip to uv.
Depthcharge v0.4.0
We have completely reworked of the data module.
Depthcharge now uses Apache Arrow-based formats instead of HDF5; spectra are converted either Parquet or streamed with PyArrow, optionally into Lance datasets.
We now also have full support for small molecules, with the MoleculeTokenizer
,
AnalyteTransformerEncoder
, and AnalyteTransformerDecoder
classes.
Breaking Changes
PeptideTransformer*
are nowAnalyteTransformer*
, providing full support for small molecule analytes. Additionally the interface has been completely reworked.- Mass spectrometry data parsers now function as iterators, yielding batches of spectra as
pyarrow.RecordBatch
objects. - Parsers can now be told to read arbitrary fields from their respective file formats with the
custom_fields
parameter. - The parsing functionality of
SpctrumDataset
and its subclasses have been moved to thespectra_to_*
functions in the data module. SpectrumDataset
and its subclasses now return dictionaries of data rather than a tuple of data. This allows us to incorporate arbitrary additional dataSpectrumDataset
and its subclasses are nowlance.torch.data.LanceDataset
subclasses, providing native PyTorch integration.- All dataset classes now do not have a
loader()
method.
Added
- Support for small molecules.
- Added the
StreamingSpectrumDataset
for fast inference. - Added
spectra_to_df
,spectra_to_df
,spectra_to_stream
to thedepthcharge.data
module.
Changed
- Determining the mass spectrometry data file format is now less fragile.
It now looks for known line contents, rather than relying on the extension.
depthcharge v0.3.1
[v0.3.1] - 2023-08-18
Added
- Support for fine-tuning the wavelengths used for encoding floating point numbers like m/z and intensity to the
FloatEncoder
andPeakEncoder
.
Fixed
- The
tgt_mask
in thePeptideTransformerDecoder
was the incorrect type.
Now it isbool
as it should be.
Thanks @justin-a-sanders!