Replies: 1 comment
-
Hey @sgbaird, first off many thanks for taking such a thorough look, we really appreciate it! Upfront, one general comment to explain the context: For our open-source release, we
In principle we have all these examples which are already ready to use and very similar, but I guess it would
We want to keep the README lightweight re example code. We don't think it's much of a jump to understand how
We also think we can do a better job here, thanks for flagging. Again the README should be lightweight,
Sure, lets discuss the details in #95
Makes sense to us, also part of #95
Good suggestion, we indeed have one nice learning curve comparison on a public data set comparing chemical Re visualization we don't plan to provide any builtins, we might consider granting access too much for
We have all agreed that this is a good suggestion, but the suggested implementation does not work for
Absolutely agree. You have encountered our doc in a not fully completed state, there are
We will provide more context and a usage idea on the more detailed userguide upcoming
Agreed, we will remove the subsections from it and have a small summary sentence on top of each section.
Let's put this in a separate discussion, as we are also very interested in your thoughts of how to make said connection better, it is pretty crucial indeed.
We've had similar thoughts, but overall the issue is that you can not only use our simulation utilities for
Two things. Indeed here we currently have the problem that we have no example and user guide for the
This can be abstracted easily to other scenarios such as changing substrates The terminology here is tricky. The act of combining the data of several contexts can be
Indeed this is already on our roadmap as we fully agree with the downsides and limitations of scalarizing.
Indeed, we are using joint batch optimization in the main and default optimizer
It's very tricky to maintain such a list, imagine we do all the work of working out exactly all
As described above we have a plot showing nicely the tremendous impact these can have, it is probably a nice plot to show already on the README
I'm not sure why you consider them over restrictive? I guess they are even less restrictive than |
Beta Was this translation helpful? Give feedback.
-
I had a nice time reviewing this repository! Overall I think it's a really comprehensive, clean, and well-documented project. Thank you for open-sourcing it! Find below some questions and suggestions:
Docs improvements
General
README
A Colab notebook or similar would be really good I think. See e.g., https://colab.research.google.com/drive/1VEHXBLVkn5NZ7N-Oj6-dc_hkIfwFcUE-?usp=sharing. I needed to
%pip install 'baybe[chem,simulation]' numpy==1.24.4
on Colab, otherwise it seems to work OK (see BUG: AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64' numpy/numpy#25150 (comment)). Consider moving "Quick Start" into a tutorial notebook and provide a Colab link. It looks like you already have it set up to convert Jupyter notebooks into html pages (e.g., https://emdgroup.github.io/baybe/examples/Constraints_Discrete/mixture_constraints.html)It appears that the README example is doing only a single iteration. I would have expected to see an optimization loop and some information about best parameters, though I get that this is geared more towards wetlab scientists.
Maybe clarify in the README that people can choose from different scaling methods with a link to the docs? I eventually happened upon that part of the docs.
I think the detailed installation information could go into an "Advanced Installation" section (either a separate README that gets incorporated into the docs, or near the end of the README). Within the main README in a "quick installation" section, then include a link to the advanced installation instructions. See Reorder README sections and print dataframe #95
The README example doesn't have much by way of outputs (e.g., print statements and expected output). See Reorder README sections and print dataframe #95
Same for visual representation, such as an optimization trace using BayBE on a task. Are there any built-in visualization methods? If not, consider including at least some examples of visualizing performance
Webpage
It would be nice to have an "Edit on GitHub" link on your documentation pages -- it makes it a lot easier for others to contribute I think. See enable edit on github button #94
It would be nice if the user guide linked to a corresponding tutorial or section of tutorials. For example, linking https://emdgroup.github.io/baybe/userguide/strategy.html to https://emdgroup.github.io/baybe/examples/Basics/strategies.html#
At a glance, this was difficult to parse:
Similar to the SequentialStrategy, the StreamingSequentialStrategy enables the utilization of arbitrary iterables to select recommender. Note that this strategy is however not serializable.
(https://emdgroup.github.io/baybe/userguide/strategy.html#the-streamingsequentialstrategy). I think I kind of get it, but not necessarily when or how I would want to use it.I think there is too much granularity on some of your docs pages on pages like https://emdgroup.github.io/baybe/examples/Constraints_Discrete/Constraints_Discrete.html (i.e., lots of repeat, not a whole lot of valuable information gained from the bottom-most headings). No worries if this would be difficult to change.
It would be nice to get some more details about each of the "Examples" sections rather than needing to click into each one to better understand what it's about. I.e., https://emdgroup.github.io/baybe/examples/examples.html could have some text at the top.
As I'm going into more of the tutorials, I'm seeing that it's really comprehensive. For example, a demonstration of adding existing data https://emdgroup.github.io/baybe/examples/Backtesting/full_initial_data.html. I think there needs to be a better way to highlight/organize/point people to the tutorials they care about most. Happy to discuss more.
Terminology
Backtesting
I don't think "Backtesting" is common terminology for chem/materials informatics communities, at least in North America. It seems to be more common in finance, for example: https://en.wikipedia.org/wiki/Backtesting. When I wandered into https://emdgroup.github.io/baybe/_autosummary/baybe.simulation.html#module-baybe.simulation, I finally realized that what you refer to as simulation and backtesting is what I would typically refer to as benchmarking. I was thinking that maybe you implemented multi-task BO, where you could leverage physics-based simulations to help inform wetlab/experimental search campaigns. It took a while before this became clear to me.
Transfer learning
Right now, "Transfer learning: Mix data from multiple campaigns and accelerate optimization" is mentioned on https://emdgroup.github.io/baybe/misc/readme_link.html#, but it doesn't seem like this is really implemented yet, other than https://emdgroup.github.io/baybe/_autosummary/baybe.simulation.simulate_transfer_learning.html#baybe.simulation.simulate_transfer_learning. However, it doesn't appear to me that transfer learning is being used here. Even going through the function (https://emdgroup.github.io/baybe/_modules/baybe/simulation.html#simulate_transfer_learning), it was a bit tough to realize what was happening until I looked up
TaskParameter
. Suddenly, it made sense to me that what you're referring to as a task parameter is what I refer to as a contextual variable. This is also really good for me to see that contextual variable optimization is supported. However, I don't really consider this as transfer learning. In my mind, transfer learning means using one model to inform another. In contextual Bayesian optimization, certain variables are being fixed at each prediction. Perhaps I misunderstood something though. I imagine this will become clearer once https://emdgroup.github.io/baybe/userguide/transfer_learning.html has been developed.Functionality
Multi-objective
It seems that Expected Hypervolume Improvement (EHVI) isn't one of the supported options for multi-objective optimization. Could you comment on this? With the
DESIRABILITY
mode, are each of the targets modeled independently prior to scalarization? If not, I tend to have a hard time referring to something like this as multi-objective optimization. In my mind, it's single-objective optimization of a fixed scalarization of several objectives. As alluded to in https://emdgroup.github.io/baybe/userguide/objective.html#desirability, it's good that a clarification is made about the scales being combined.Batch conditioning
Do you perform conditioning on your batches (i.e., compute a joint acquisition function value)? For example, using fantasy point modeling. This is one of the easiest "gotchas" of batch optimization. See facebook/Ax#778 (comment) and https://youtu.be/JzgkSR6FFyM?si=dzv3RVvjKrZlkjlH
Comparison to other packages
What needs does BayBE fulfill that other packages don't? I think the README should clarify what makes BayBE stand apart from others and reference these other packages, too. For example, there's Ax (https://ax.dev), Gauche (https://github.com/leojklarner/gauche), Atlas (https://github.com/aspuru-guzik-group/atlas), Olympus, and https://github.com/experimental-design/bofire.
For example:
I keep what is probably an overly inclusive list of GitHub repos at https://github.com/stars/sgbaird/lists/optimization-and-tuning and a shortlist at https://github.com/AccelerationConsortium/awesome-self-driving-labs/blob/main/readme.md#optimization. I added
BayBE
to these lists recently.I'm also interested to see an optimization comparison/benchmark of using the Mordred encoding with the solvent vs. treating it as a purely categorical variable.
Software development
I can appreciate that BayBE seems well-maintained from a software developer perspective! This is welcome in the fields of chemistry and materials science, which understandably often lacks this.
There are a lot of dependencies. I'm glad you split them up into groups!
I notice you have a lot of
>=
dependencies in https://github.com/emdgroup/baybe/blob/main/pyproject.toml. Is this overly restrictive? It's OK if you don't think so.The docstrings look really nice, and it's nice to have the function cross-linking across the API docs.
I look forward to seeing how you use hypothesis testing here!
Feel free to convert to a discussion if desired, and happy to refactor into multiple items if that would be better.
Beta Was this translation helpful? Give feedback.
All reactions