Skip to content

Conversation

zhuoxinshi
Copy link
Contributor

@zhuoxinshi zhuoxinshi commented Jul 29, 2025

This is the first part of incorporating DIA analysis into MetaMorpheus. The DIA workflow being implemented here reconstructs DIA MS2 scans into pseudo Ms2WithSpecificMass that can be searched directly by our DDA search engines. This includes the following steps:

  1. Reading in raw scans and dividing MS2 scans with different isolation windows into different DIA window groups.
  2. Constructing XICs for all peaks in the MS1 scans and MS2 scans in the same window.
  3. Grouping the precursor XICs with fragment XICs in the same window into precursor-fragments groups based on their correlations.
  4. Converting precursor-fragments groups into pseudo Ms2WithSpecificMass.

The DIAEngine class contains this complete workflow and outputs the final list of pseudo Ms2WithSpecificMass. Inside DIAEngine, the process of constructing XICs and grouping precursors and fragments are represented by abstract classes XicConstructor and PfGroupingEngine that can take in different strategies.

Copy link

codecov bot commented Jul 29, 2025

Codecov Report

❌ Patch coverage is 99.05363% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.86%. Comparing base (2e579dc) to head (a67c466).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...Morpheus/EngineLayer/DIA/PrecursorFragmentGroup.cs 97.32% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2546      +/-   ##
==========================================
+ Coverage   96.75%   96.86%   +0.11%     
==========================================
  Files         157      167      +10     
  Lines       23410    23886     +476     
  Branches     3252     3315      +63     
==========================================
+ Hits        22650    23138     +488     
+ Misses        755      743      -12     
  Partials        5        5              
Flag Coverage Δ
unittests 94.95% <97.81%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
MetaMorpheus/EngineLayer/DIA/DIAEngine.cs 100.00% <100.00%> (ø)
MetaMorpheus/EngineLayer/DIA/DIAparameters.cs 100.00% <100.00%> (ø)
.../DIA/PrecursorFragmentGrouping/PfGroupingEngine.cs 100.00% <100.00%> (ø)
...DIA/PrecursorFragmentGrouping/XicGroupingEngine.cs 100.00% <100.00%> (ø)
...aMorpheus/EngineLayer/DIA/PrecursorFragmentPair.cs 100.00% <100.00%> (ø)
...eLayer/DIA/XicConstruction/MzPeakXicConstructor.cs 100.00% <100.00%> (ø)
...r/DIA/XicConstruction/NeutralMassXicConstructor.cs 100.00% <100.00%> (ø)
.../EngineLayer/DIA/XicConstruction/XicConstructor.cs 100.00% <100.00%> (ø)
...Morpheus/EngineLayer/DIA/PrecursorFragmentGroup.cs 97.32% <97.32%> (ø)

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trishorts trishorts requested a review from Copilot September 3, 2025 19:59
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces DIA (Data-Independent Acquisition) analysis capabilities to MetaMorpheus by implementing a workflow that reconstructs DIA MS2 scans into pseudo Ms2WithSpecificMass objects that can be searched by existing DDA search engines.

Key changes:

  • Implements a complete DIA workflow including MS2 scan grouping by isolation windows, XIC construction, precursor-fragment grouping, and pseudo MS2 scan generation
  • Introduces abstract base classes (XicConstructor, PfGroupingEngine) with concrete implementations for different XIC construction strategies
  • Adds comprehensive test coverage for all major DIA workflow components

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Test.csproj Adds test data file for DIA testing
TestXicGrouping.cs Comprehensive tests for XIC correlation, overlap calculations, and grouping logic
DIAEngineTest.cs End-to-end tests for the complete DIA workflow including scan window mapping and pseudo scan conversion
XicConstructor.cs Abstract base class defining XIC construction interface
NeutralMassXicConstructor.cs Concrete implementation for neutral mass-based XIC construction
MzPeakXicConstructor.cs Concrete implementation for m/z peak-based XIC construction
PrecursorFragmentPair.cs Data structure representing precursor-fragment XIC pairs
XicGroupingEngine.cs Implementation of precursor-fragment grouping based on correlation and overlap thresholds
PfGroupingEngine.cs Abstract base class for precursor-fragment grouping strategies
PrecursorFragmentGroup.cs Core logic for XIC correlation/overlap calculations and pseudo MS2 scan generation
PseudoMs2ConstructionType.cs Enum defining different pseudo MS2 construction strategies
DIAparameters.cs Configuration class for DIA workflow parameters
DIAEngine.cs Main engine orchestrating the complete DIA workflow

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

pfGroup.PFpairs.Sort((a, b) => a.FragmentXic.AveragedM.CompareTo(b.FragmentXic.AveragedM));
// This is currently taking the mz value of the first peak as the representative mz for the fragment XIC; it is only for testing purposes
// It will be changed to the mz value of the highest peak or the averaged mz value of the XIC when there is a release for the updated mzLib.
var mzs = pfGroup.PFpairs.Select(pf => (double)pf.FragmentXic.Peaks.First().M).ToArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated after new mzlib is merged and before we merge this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many places in which this class can be optimized. The main slowdowns here will be collection handling. Many new lists are created throughout this process and immediately discarded. We can optimize this by using object pools and IEnumerable return.

I am happy to work with you on this if you would like, either before or after this PR is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants