-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit workflow arguments #162
Comments
Just glancing at the code, it looks like --filecsdata and --strict were post meica v2.5 additions so it seems like, whatever they did may have been removed when the code was revised back towards v2.5. I don't have time right now to dig into what inforation was saved with --filecsdata, but if we're rethinking how to record component selection information, I concur that it's better to focus on doing this right than keeping a specific option label. I haven't used --sourceTEs, but depending how it's implemented, I could see it being a useful option. For something like BIDS compliant data where you don't need to give it every echo's file name, it would be good to have an option that lets you run the algorithm using just a subset of the acquired echoes. I don't know how this option is implemented in practice, but I think it's a reasonable bit of functionality to keep. Given that denoiseTEs and stabilize both of opaque functionality based on their names and seem to be partially subsumed by other options, I don't have any particular attachment to them. |
@handwerkerd Thanks for your input. I've opened a new PR with relevant changes (#163), but will keep Also, given that we want to make MLE the default component selection method for TEDPCA, we should revisit |
It actually seems like It would be nice to know what running On another note, in terms of revisiting If removing the arguments is no good, what about combining them into a single, comma-separated argument? At minimum, we can place those arguments in their own section of the CLI argument documentation, to make it clear that they're only used if the decision tree is used. |
In my experience kdaw has a particularly dramatic impact on convergence. I primarily experimented with kdaw, given that it was easily modifiable when calling the function. In cases where the default kdaw resulted to far too many components (my data consist of >450 timepoints) I could reduce the default value to reduce the number of components. Sometime the problem would be the inverse. I did not like this fiddlelyness. The big reason is that I don't have a good understanding for what the parameters do, exactly. I also don't like subject specific, arbitrary choices in data processing. Is it correct that they determine what the cut off will be for the thresholding on the PCA maps, for determining the number of PC components to keep for ICA? I base that on commit comments from bitbucket here I am very much looking forward to not using them. That said, I think a single comma-separated parameter that can be added if one is going to all of the trouble of using TEDPCA instead of the new defaults. |
Sorry for being so long-delayed to this thread, and thanks for starting it @tsalo !! A few thoughts:
|
@dowdlelt Yes, I think that they operate like that. I believe that increasing tedana/tedana/decomposition/eigendecomp.py Lines 251 to 252 in 6e21c5d
First, the three threshold sources ( Of course, -1 is a special value for both Perhaps just collecting
|
I think that everything discussed here so far has been handled by #163 and #164. The only other thing I was thinking we could work on is simplifying how manually accepted components are handled. For starters, I propose that we could move A more aggressive change (that I'm somewhat in favor of) would be to drop |
Hi all, trying to track some of the changes made between BitBucket 3.2.2 version and this one. Is the current version robust to kdaw and rdaw values? And if not, see ME-ICA/me-ica#10 at some point the default kdaw and rdaw were changed from there to here. (Seemed like the best place to put this comment-- will move if wrong). |
Thanks for confirming ! We've removed the |
The default values of 10 and 1 for Our rationale for hardcoding these values is, as @emdupre said, because they're difficult to understand for users and because we've changed the default |
Ah, I see, the values I have are from a different branch than master, v3.2 seen here. Has the project decided to retain only behavior from v3? I realize that the project has decided to not retain backwards compatibility for ME-ICA, but I thought it would be good to make a record of why this divergence exists. |
We don't support v3, although we may merge that back in in the future (as one option, not as the only method). We have shifted to only supporting v2.5 of the component selection algorithm. We have that information, and why we shifted, here in the FAQ section of the documentation site, although it's a fairly new addition. |
Okay, I see, sorry about all that. |
Thanks for checking on it ! It's good to have more people thinking through these choices, too :) |
I think that we can close this issue now. We may ultimately want to discuss removing or reorganizing |
In addition to the arguments we're considering modifying or adding in #161, I think we should audit several of the other arguments in the workflow.
FastICA parameters
Now that we're using
sklearn
for the ICA, we get a warning in cases of non-convergence instead of an error. Overall, I think this is a good thing, but it also obscures potential problems. The two arguments tosklearn.decomposition.FastICA
that are relevant aremax_iter
(maximum number of iterations) andtol
(convergence tolerance). The default fortol
is 0.0001, while our default is 0.000025, and is set by theconv
argument. My questions here are, isconv
something anyone ever plays with (i.e., do we actually need an argument) and is there a reason to use the much stricter default threshold we have over the default fromsklearn
? If not, then I propose that we drop theconv
argument fromtedana
and letsklearn
use its default intedica
.On a related note, the default for maximum number of iterations is 200, while
mdp
's was 5000, IIRC. The problem is thatFastICA
is failing to converge for both of our test datasets (Cornell 3-echo rest and NIH 5-echo task). I think we should probably increasemax_iter
when we callFastICA
to at least 1000, which will probably increase the amount of timetedica
takes, but will probably also still be faster thanmdp
.Other workflow parameters
I think that
--filecsdata
and--strict
are currently unused. Unless we plan to re-implement them, I think we can drop them.Does anyone use
--sourceTEs
,--denoiseTEs
, or--stabilize
?sourceTEs
ordenoiseTEs
, but neither seems very useful, unless I'm misunderstanding them.sourceTEs
: While I maybe understand switching between usingcatd
vsoptcom
for fitting models (though I've never tried the former), if we wanted to fit to just some of the echoes, why not just feed those echoes into the workflow?denoiseTEs
: This just determines whether additional files are written out. Unless they're regularly used, I propose that we incorporate this switch into a generalverbose
argument as we've been discussing in Reduce intermediate outputs #17.stabilize
won't have an impact on the default pipeline. I also don't know if it has any meaningful impact on the results in most cases when the decision tree is used. I propose that we simply treat it as a variant of the decision tree. So, when we do split the decision tree from MLEPCA, we will have some argument like--tedpca
that can take values ofmle
,kundu
, andkundu-stabilize
, withmle
being the default. Does that sound good?The text was updated successfully, but these errors were encountered: