Skip to content

Conversation

isaf27
Copy link

@isaf27 isaf27 commented Jun 2, 2025

TL;DR

This PR implements Fast-SNP algorithm for adding loopless constraints. It consists of two key optimizations:

  • Consider for loopless constraints only the reactions that can be part of a cycle (loop).
  • Build sparse $N_{\text{int}}$ matrix using Fast-SNP algorithm (https://doi.org/10.1093/bioinformatics/btw555), which takes reaction flux sign (positive/negative) into account.

As a result a 10-100x (or more) speed-up for loopless FVA is achieved.

This PR is somewhat similar to #841, but differs in implementation.

Code modifications

One of the key optimizations is that add_loopless and flux_variability_analysis functions use only reactions that can be a part of a cycle (we call them cyclic). This greatly improves performance due to the number of cyclic reactions is usually much smaller than the number of internal reactions.

Function find_cyclic_reactions

We add a new function find_cyclic_reactions which finds all reactions in the model that can be a part of a cycle.
Also for each such reaction we identify if it can have positive and negative flux in a cycle. This is tested by restricting boundary reaction fluxes to zero and optimizing flux through each of the internal reactions.

There are two methods (method parameter) that can be called:

  • basic: a straightforward procedure that iterates over all of the reactions;
  • optimized: an optimized version inspired by Fast-SNP random weights idea, which discovers many cyclic reactions in one go.

The second method uses less LP programs and usually works at least 2x faster. It is a custom procedure, so we don't have a citation for the algorithm.

Function add_loopless

We introduced new method parameter. If method="fastSNP" we:

  1. Find all cyclic reactions using find_cyclic_reactions.
  2. Calculate $N_{\text{int}}$ for the cyclic reactions using Fast-SNP nullspace basis construction algorithm, taking model flux bounds-based signs into account. As the result, the constraints are constructed using much smaller and sparse $N_{\text{int}}$ matrix which greatly reduces number of binary variables in the corresponding MILP prolem and improves the speed of FVA.

If method="original" then the current SVD-based algorithm is ran on all of the internal reactions.

Function flux_variability_analysis

We made some optimizations for this function if loopless=True.

  1. We always find all cyclic reactions and their possible directions using find_cyclic_reactions function. Reactions which are not cyclic are optimized using the basic FVA mode—their extreme values are guaranteed to be achievable without using cycle fluxes. This greatly improves the performance of the function even when using the CycleFreeFlux method.
  2. We added a new add_loopless regime to this function, which adds loopless constraints (re-using the calculated cyclic reactions and add_loopless function) to solve loopless FVA for the cyclic reactions. In this regime the exact FVA bounds are calculated (unlike for the CycleFreeFlux method).

Performance

FVA quick example

For the quick demonstration of the performance improvement we assembled the following script, which solves FVA for a single reaction with the original and new Fast-SNP loopless constraints.

import time

from cobra.io import load_model
from cobra.flux_analysis.loopless import add_loopless


for model_name, reaction in [
    ('iIT341', 'ACKr'),
    ('iAF692', 'MDHy'),
    ('iNJ661', 'NADTRHD_copy1'),
]:
    print(f'Working with model {model_name}...')

    for method in ['original', 'fastSNP']:
        model = load_model(model_name)

        print(f'Finding loopless FVA bounds for reaction {reaction} using {method} method...')

        start_time = time.time()
        add_loopless(model, method=method)
        print(f'Loopless constraints added in {time.time() - start_time:.2f} seconds.')
        
        bounds = {}
        start_time = time.time()
        for direction in ['min', 'max']:
            rxn = model.reactions.get_by_id(reaction)
            model.solver.objective.set_linear_coefficients(
                {rxn.forward_variable: 1, rxn.reverse_variable: -1}
            )
            model.solver.objective.direction = direction
            
            bounds[direction] = model.slim_optimize()
        
        print(f'FVA bounds calculated in {time.time() - start_time:.2f} seconds.')
        print(f'Loopless FVA bounds for {reaction} ({method}): {bounds}')

Here are the example run times for the tested model/reaction pairs (CPLEX solver was used for MILP):

  • Model iIT341, reaction ACKr: $0.86 \to 0.09$, $9.6\times$ speed-up.
  • Model iAF692, reaction MDHy: $1.35 \to 0.11$, $12.3\times$ speed-up.
  • Model iNJ661, reaction NADTRHD_copy1: $120.61 \to 0.93$, $129\times$ speed-up.

The fastest complete loopless FVA

The most efficient way to calculate loopless FVA for all of the reactions is to directly call flux_variability_analysis function, without using explicit add_loopless:

model = load_model(model_name)
flux_variability_analysis(model, loopless=True, params_add_loopless={})

Discussion points

There are several points regarding the API we wanted to highlight for discussion:

  1. As was mentioned flux_variability_analysis(model, loopless=True, params_add_loopless={}) is the most efficient way to calculate loopless FVA. The function flux_variability_analysis identifies which method (add_loopless-based or CycleFreeFlux) will be used based on the params_add_loopless presence. This form enables backward-compatibility, but is a bit clunky. A better way could be to make loopless parameter deprecated and replace it with a new string-based method parameter to control the behavior, with the corresponding backward compatibility-handling procedure that sets method="CycleFreeFlux" if loopless=True is present.
  2. In add_loopless function the original $N_{\text{int}}$ construction algorithm can also benefit from being run only for the cyclic reactions. We could create another option use_cyclic=True to enable that, however we decided not to implement it. Our thinking is that method=original is provided for the reproducibility of the previous behavior, but ultimately shouldn't be used: the Fast-SNP algorithm should always provide a better performance.
  3. Conversely, flux_variability_analysis(loopless=True) can still be ran when the full loopless FVA is not feasible. Thus we decided to implement running CycleFreeFlux only for the cyclic reactions, which significantly improves the performance. This changes the behavior, but should produce the same results as before.

We would be happy to hear your feedback and are ready to implement any additional changes to API/docs/tests/etc that are needed for this PR to be merged.

@cdiener
Copy link
Member

cdiener commented Jun 3, 2025

Thanks so much. This looks great! Sorry for the issues with the CI. I will try to fix them this week, but you might have to rebase on the devel branch after that.

In general I think it will be a great contribution. I will try to review as soon as possible.

@isaf27
Copy link
Author

isaf27 commented Jul 3, 2025

@cdiener Hello. I've merged devel into this PR (I've seen you did some python versions adjustment). Please approve the CI workflow to check. Is it true, that now CI should work successfully?

@cdiener
Copy link
Member

cdiener commented Jul 21, 2025

Sorry for the delay, currently on paternity leave. Looks like a great addition. Hopefully I can get to it in the next weeks.

I am actually not that opposed to changing the loopless argument to a string. It seems the most explicit and it would be an easy fix in existing code.

@isaf27
Copy link
Author

isaf27 commented Sep 8, 2025

@cdiener Hello. I've added a small fix to fastSNP algorithm implementation to make it stable. Now tests should pass, please approve the CI workflow to check.

Copy link
Member

@cdiener cdiener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thanks, realy great addition.

Are the FastSNP variables and constraints incompatible with the original LP? If not it would probably be much more efficient to just modify the existing LP instance with a context which will also make sure all changes are reverted after.

directions_int: np.ndarray,
zero_cutoff: float,
bound: float,
) -> Tuple["optlang.interface.Model", List["optlang.interface.Variable"]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add a short docstring what this does.

model = solver.Model()

q_list = []
for i in range(s_int.shape[1]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be faster to add those to the original model and use a context? For instance the code below re-adds all the constraints from S that are already in the original model

model.add(q_list)

for idx, row in enumerate(s_int):
nnz_list = np.flatnonzero(np.abs(row) > zero_cutoff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will ignore additional constraints from model.constraints that the user may have defined.

loopless : bool, optional
Whether to return only loopless solutions. This is significantly
slower. Please also refer to the notes (default False).
add_loopless_params : dict, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used to set the loopless method maybe this should rather be a loopless_method argument. Or even more concide make the loopless arg Tuple[bool, str] and accept a True value or a method string.

modulo_list = []
v_list = []

for i in range(n):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might miss user-defined custom variables

{x: 1.0, v: 1.0}
)

for idx, row in enumerate(S):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might miss user-defined constraints

q_list.append(q)
model.add(q_list)

for idx, row in enumerate(s_int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be more efficient to iterate over model.constrains directly and clone or modify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Nullspace calculation similar to COBRA (MATLAB)
2 participants