-
Notifications
You must be signed in to change notification settings - Fork 229
feat: implement Fast-SNP, speed up loopless FVA #1444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Thanks so much. This looks great! Sorry for the issues with the CI. I will try to fix them this week, but you might have to rebase on the devel branch after that. In general I think it will be a great contribution. I will try to review as soon as possible. |
@cdiener Hello. I've merged devel into this PR (I've seen you did some python versions adjustment). Please approve the CI workflow to check. Is it true, that now CI should work successfully? |
Sorry for the delay, currently on paternity leave. Looks like a great addition. Hopefully I can get to it in the next weeks. I am actually not that opposed to changing the loopless argument to a string. It seems the most explicit and it would be an easy fix in existing code. |
@cdiener Hello. I've added a small fix to fastSNP algorithm implementation to make it stable. Now tests should pass, please approve the CI workflow to check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks, realy great addition.
Are the FastSNP variables and constraints incompatible with the original LP? If not it would probably be much more efficient to just modify the existing LP instance with a context which will also make sure all changes are reverted after.
directions_int: np.ndarray, | ||
zero_cutoff: float, | ||
bound: float, | ||
) -> Tuple["optlang.interface.Model", List["optlang.interface.Variable"]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add a short docstring what this does.
model = solver.Model() | ||
|
||
q_list = [] | ||
for i in range(s_int.shape[1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be faster to add those to the original model and use a context? For instance the code below re-adds all the constraints from S that are already in the original model
model.add(q_list) | ||
|
||
for idx, row in enumerate(s_int): | ||
nnz_list = np.flatnonzero(np.abs(row) > zero_cutoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will ignore additional constraints from model.constraints that the user may have defined.
loopless : bool, optional | ||
Whether to return only loopless solutions. This is significantly | ||
slower. Please also refer to the notes (default False). | ||
add_loopless_params : dict, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only used to set the loopless method maybe this should rather be a loopless_method argument. Or even more concide make the loopless arg Tuple[bool, str] and accept a True value or a method string.
modulo_list = [] | ||
v_list = [] | ||
|
||
for i in range(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might miss user-defined custom variables
{x: 1.0, v: 1.0} | ||
) | ||
|
||
for idx, row in enumerate(S): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might miss user-defined constraints
q_list.append(q) | ||
model.add(q_list) | ||
|
||
for idx, row in enumerate(s_int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be more efficient to iterate over model.constrains directly and clone or modify
TL;DR
This PR implements Fast-SNP algorithm for adding loopless constraints. It consists of two key optimizations:
As a result a 10-100x (or more) speed-up for loopless FVA is achieved.
This PR is somewhat similar to #841, but differs in implementation.
Code modifications
One of the key optimizations is that
add_loopless
andflux_variability_analysis
functions use only reactions that can be a part of a cycle (we call them cyclic). This greatly improves performance due to the number of cyclic reactions is usually much smaller than the number of internal reactions.Function
find_cyclic_reactions
We add a new function
find_cyclic_reactions
which finds all reactions in the model that can be a part of a cycle.Also for each such reaction we identify if it can have positive and negative flux in a cycle. This is tested by restricting boundary reaction fluxes to zero and optimizing flux through each of the internal reactions.
There are two methods (
method
parameter) that can be called:basic
: a straightforward procedure that iterates over all of the reactions;optimized
: an optimized version inspired by Fast-SNP random weights idea, which discovers many cyclic reactions in one go.The second method uses less LP programs and usually works at least 2x faster. It is a custom procedure, so we don't have a citation for the algorithm.
Function
add_loopless
We introduced new
method
parameter. Ifmethod="fastSNP"
we:find_cyclic_reactions
.If
method="original"
then the current SVD-based algorithm is ran on all of the internal reactions.Function
flux_variability_analysis
We made some optimizations for this function if
loopless=True
.find_cyclic_reactions
function. Reactions which are not cyclic are optimized using the basic FVA mode—their extreme values are guaranteed to be achievable without using cycle fluxes. This greatly improves the performance of the function even when using the CycleFreeFlux method.add_loopless
regime to this function, which adds loopless constraints (re-using the calculated cyclic reactions andadd_loopless
function) to solve loopless FVA for the cyclic reactions. In this regime the exact FVA bounds are calculated (unlike for the CycleFreeFlux method).Performance
FVA quick example
For the quick demonstration of the performance improvement we assembled the following script, which solves FVA for a single reaction with the original and new Fast-SNP loopless constraints.
Here are the example run times for the tested model/reaction pairs (CPLEX solver was used for MILP):
iIT341
, reactionACKr
:iAF692
, reactionMDHy
:iNJ661
, reactionNADTRHD_copy1
:The fastest complete loopless FVA
The most efficient way to calculate loopless FVA for all of the reactions is to directly call
flux_variability_analysis
function, without using explicitadd_loopless
:Discussion points
There are several points regarding the API we wanted to highlight for discussion:
flux_variability_analysis(model, loopless=True, params_add_loopless={})
is the most efficient way to calculate loopless FVA. The functionflux_variability_analysis
identifies which method (add_loopless
-based or CycleFreeFlux) will be used based on theparams_add_loopless
presence. This form enables backward-compatibility, but is a bit clunky. A better way could be to makeloopless
parameter deprecated and replace it with a new string-basedmethod
parameter to control the behavior, with the corresponding backward compatibility-handling procedure that setsmethod="CycleFreeFlux"
ifloopless=True
is present.add_loopless
function the originaluse_cyclic=True
to enable that, however we decided not to implement it. Our thinking is thatmethod=original
is provided for the reproducibility of the previous behavior, but ultimately shouldn't be used: the Fast-SNP algorithm should always provide a better performance.flux_variability_analysis(loopless=True)
can still be ran when the full loopless FVA is not feasible. Thus we decided to implement running CycleFreeFlux only for the cyclic reactions, which significantly improves the performance. This changes the behavior, but should produce the same results as before.We would be happy to hear your feedback and are ready to implement any additional changes to API/docs/tests/etc that are needed for this PR to be merged.