-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could SearchGUI write somewhere else than in SearchGUI-4.2.8/resources/(conf | temp) ? #349
Comments
You can change the temp directories via PathSettingsCLI: https://github.com/compomics/searchgui/wiki/SearchCLI#pathsettingscli If I remember correctly, you can also use the PathSettingsCLI parameters directly as part of a standard SearchCLI command line. You can also change the path settings using the graphical user interface via the Edit > Resource Settings menu option. |
If I set paths with PathSettings cli, or SearchCLI where will it get saved ? I suppose it gets save in: ./SearchGUI-4.2.8/resources/conf/paths.txt Can I specify the directory where path.txt is saved ? If I can't specify it, I don't see how I could run multiple instances of SearchGUI from the same jars, because they will all save in the same ./SearchGUI-4.2.8/resources/conf/paths.txt |
Correct.
No, I'm afraid this is not supported at the moment (as it is assumed that you have write access to the directory where SearchGUI is installed).
Indeed you are right. Note the following from the SearchCLI wiki: Not sure how easy it would be to change this. But I'm also not sure how well it would work, given that the included search engines generally assume that they are in control of the distribution of the workload (i.e. with regards to parallell processing etc.). Hence, I'm unsure what could be gained by running more than one instance of the jar file at the time instead of one at the time on a row? May I ask why you want to run multiple instances of SearchGUI from the same jar (at the same time)? |
I have a pipeline that runs thousands of instances of SeachGUI, and also peptide shaker, In my code I have a directory COMPOMICS_HOME where I store installations of both software. Every task needs to copy COMPOMICS_HOME in it's bucket (__task_output_dir) before it can run : copy_compomics_home_cmd = "cp -R $COMPOMICS_HOME $__task_output_dir" the I can run them: java $searchgui_jar_path ... It's not the end of the world, but there is this other use case where I have a containerized (https://apptainer.org/) Of course the container will need a temp directory anyways, so what's the big problem with throwing the whole COMPOMICS_HOME in there ? Again, it's not the end of the world, just a bit a space wasted, and a few extra steps in the code required. I just find that it would be nicer if there was a separate "config directory", who's location you could decide, separate from both the working directory and (especialy) the code directory, where nothing changes unless you change the version of the software. |
Implementation wise, it would be quite easy: The path itself of resources/conf/paths.txt could be taken from en environment variable only if it is set, othewise it would be taken in the jar path, just like it currently does. Even better would be to do the same for all the paths inside resources/conf/paths.txt, but that's probably a bit more work. |
To further sell the case that a configurable "config dir" would be nice, after I run my 10k instances of SearchGUI+PeptideShaker, By deleting the per task COMPOMICS_HOME copy, I am perhaps losing useful debugging info, if I discover junk in the data in a month or two, I might want to investigate if something was wrong in the configs. Perhaps there is nothing there worth investigating in there, but given that config data is usualy very small in size, the cost of keeping it "just in case" is tiny. BTW, the -use_log_folder 0 in recent versions is really useful, for the same reason: I get to decide where the logs go, because I get to decide where I send stdout, and as a bonus, whenever I need to investigate something, there is only one place to look into. My programs that run before and after SearchGUI also log in that same place, it makes debugging much simpler. I hope this dooesn't comes accross as pedantic, we really like SearchGui and Peptide Shaker, we really like the way it's evolving, I think the ability to decide where the config files go would greatly improve the user experience (or developper experience !). |
No worries. We're always happy to get input on how we can further improve our software.
If you use the temp_folder option as part of your SearchCLI command lines this should set all of the paths to the same folder, hence there should be no need to set the specific paths in addition? BTW, what happens if you simply provide different temp folder paths via the temp_folder option for each instance? As far as I can tell SearchGUI should then use the provided folder and not the one in the resources folder. I haven't actually tested this though. |
The -temp_folder option helps, I use it, but there are still files that are written to in sub directories of: CompomicsWrapper.getJarFilePath(this.getClass().getResource("SearchGUI.class").getPath(), "SearchGUI") This is why I need to copy the whole $COMPOMICS_HOME (a directory where I have a SeachGUI and PeptideShaker installation) in the working folder of each of the 15k jobs I run. Other config directories I am able to override are those that are define in System.getProperty("user.home"), ex:
java -Duser.home= That's very helpful
What would really be nice, is an environement variable, ex: COMPOMICS_CONFIG_HOME Any config related file would look if this env variable exists, before looking elsewhere |
Would it be a viable alternative to add a new specific path option called, for example, -config_folder that allows you to set the config folder via the SearchCLI or PathCLI command lines? I'm still not sure how the search engines will react though as some of them still write to their local temp folders which is something we cannot override for all of them. Hence we may fix the SearchGUI-specific issues (which in any case is a pluss), but it may just lead to other issues down the line. |
I actualy just ran into a problem with the "save compomics configs because they might help debugging later" I ran jobs in a datacenter where I have limit on the number of files, that is 1 million, and I busted the limit. So the only place I can copy COMPOMICS_HOME is on $SLURM_TMPDIR, a directory that disapears once the job ends. So keeping configs around is not even an option, if they are mixed in the whole compomics installation. |
-config_folder would be great ! I just thought that an env variable was easyer on your part, because you don't need to pass it from the cli programs, all the way down to every bits of code that use them, you can just sprinkle a few "if var exists" in strategic places. A pattern that I like for CLI toolkits (with multiple cli tools), is that when an argument is common in all tools, you have the choice set it either as en env var, or as command arg, and the arg overrides the env var if both are set. That being said, -config_folder will definitely help. |
I was thinking the other way around. That it would be easier to add one more variable into the same setup that we already have. As then I can do something similar to what we already do for the log folder:
But won't that only work if you have the same settings for all of them? For example, I think you may end up in trouble if you have different species for the different runs (as the gene mappings will be different). And there may also be issues with more than one instance accessing the same files? So perhaps safer to have one folder per run in order to avoid such potential issues? Anyway, I'll see what I can do. Probably won't be until after Easter though. |
My thoughts on this are perhaps "philosophical", but I'll share them anyways ;-) I like functionnal programming very much, in particular the idea that a function is an order of magnitude simpler (to understand, to use, to debug, etc) when it's output is determined only by it's inputs. That can't be the case if the function has a "memory", because every time you call it, it can "remember" things from the previous call. In my pipelines, things are much simpler when the programs behave like functions. Containers (ex. Apptainer) are one way to achieve this: they have a read only file system, and you can only write in externaly mapped folders. You get a strong guarantee that the "universe is reset" on every call. I containerized SearchGUI and PeptideShaker for exactly this purpose. The first problem I had was that the code (jars, etc) could not be "inside" the container (on the read only file system), because it has to write in the same place as it's code. So in order to "reset the universe on every call", I copy a fresh install of compomics to a folder outside the container, and when it's done, I "delete the universe" so I don't bust my file count limit. In a desktop environement, it's actualy feature (not a bug) when a software installation "remembers" it's configuration, for pipelines, it's another thing. In my usage of SearchGUI+PeptideShaker, if I could have everything that isn't config related on a readonly drive (like the code), and the configs either as command line args, env vars or an external drive, then it would behave like a memory less a function ! |
I've deployed a beta version of SearchGUI that supports the config_folder option here: https://genesis.ugent.be/archiva/repository/maven2/eu/isas/searchgui/SearchGUI/4.2.10-beta/. However, as far as I can tell the conf folder is not used when using the -temp_folder option (at least not in this new version). But perhaps you can try this beta version and see which files, if any, are still written to the config folder? |
SearchGUI writes temp files and configurations in sub directories of :
https://github.com/compomics/searchgui/blob/master/src/main/java/eu/isas/searchgui/cmd/SearchCLI.java#L359
It poses a few problems:
It would be nice to be able to override both ./resources/conf ./resources/temp directories, for example with an environement variables
The text was updated successfully, but these errors were encountered: