Basic transfer learning example #257
Replies: 2 comments 4 replies
-
Some related literature based on Meta AE workshop 2024: (1) Fan, Z.; Han, X.; Wang, Z. HyperBO+: Pre-Training a Universal Hierarchical Gaussian Process Prior for Bayesian Optimization. Additionally: |
Beta Was this translation helpful? Give feedback.
-
Let me share some of my thoughts on our examples (@Scienfitz @AdrianSosic please feel free to add to this, in particular if you interpret things differently). The examples that we have on the webpage are meant for users to demonstrate how to use BayBE as easily as possible. That is, the examples are intended to be very minimal and should not contain a lot of explanations, involved computations or similar. For more explanations, we have the user guides, and we do not aim to have extensive examples showing everything you can do on our webpage. In particular, our examples are not meant to necessarily show the best possible performance that you can get with BayBE. They should show "Hey, this is something that you can do with BayBE, and look how easy it is!", and should provide users with a first "template" to implement their own use cases. The numbers/dimensions in the examples were mainly chosen in a way to have a good compromise between "reasonable computation time" and "results look sufficiently promising". I think the points that you make here (which are good!) are important, but they are not what we aim to do with our examples. But let's hear @AdrianSosic and @Scienfitz thoughts on this :) |
Beta Was this translation helpful? Give feedback.
-
I took the basic transfer learning example and brought it into a Colab notebook: https://colab.research.google.com/drive/1YOVW7hxdBlRrmnrYirubU5Yj2Z7GEqup?usp=sharing. It seemed to run OK (thanks for providing a
SMOKE_TEST
by the way).I think I get the desire to show proportion of candidates. For example, the candidate pool has 125 points, we start off with 20% = 25 candidates evaluated on the training function, then we see how the optimization progresses as we evaluate any of the 125 candidates on the test function. So, you're able to report that 20% of the candidates were evaluated on a related task. Some thoughts:
POINTS_PER_DIM=5 seems a bit low in the example.
DIMENSION = 3
andPOINTS_PER_DIM = 5
, so 125 points in the candidate pool. It would probably be best to mention this explicitly in the text, and perhaps even in the figure itself.I think a quasi-random method, or at least a grid search with some jitter added to the training datapoints for each of the repeat campaigns would make sense. See 1.0-traditional-doe-vs-bayesian.ipynb and search for "jitter". In this case, jitter is added to function parameter space itself (e.g., add$x_1$ values and subtract $x_2$ values for campaign 1).
0.24
to all0.68
from all(aside: you can see the plot outputs directly in the HTML version - excuse the overly verbose outputs).
Beta Was this translation helpful? Give feedback.
All reactions