-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New design for DataSet
and Estimator
#69
Comments
Very nice overview! Two quick comments without having gone through all the code:
|
Mh not sure if I understand correctly but we could do essentially what we currently do with
Or we define the enum without implementing the trait and just write the match statements within the methods of the |
That is part of the problem IMO. The other problem is that the number of return values for a single |
The prediction of a target property for a given model and a set of thermodynamic states is a problem that can easily be parallelized.
Our current
DataSet
trait'spredict
method is very flexible in that a user can freely decide how to iterate over data and possibly calculate properties that are needed during that iteration. The downside of the current design is that we cannot implement the parallelization in a generic way.This issue discusses ways to change the
DataSet
trait so that a generic parallel evaluation of data points can be provided.In- and ouput of the prediction of a single data point
We could add a method into the trait that returns the prediction for a single data point.
The prediction method for all data can then loop over this method. The same method could be reused for a parallel and a sequential prediction method.
However, different properties require different inputs both in terms of the number of inputs as well as the data type(s). E.g.
SINumber
SINumber
f64
, t, p asSINumber
SINumber
)The above
predict_datapoint
method would work for the vapor pressure but not for the chemical potential.Option 1: Unify in- and output of the
prediction
method for a data pointWe could get rid of the unit in the in- and output and be generic in the length of in- and output data.
Implementation (click)
relative_difference
(and thus the cost function) can be generically implemented since the result of predict is just aVec<f64>
.DataSet
s can be stored in aVec<Arc<dyn DataSet<E>>>
in theEstimator
which then can be used as is.predict
are always the same which is nice to check/plot how a model's parameters perform.Vec
s have to be allocated during the iteration (most of the time with a single value)Option 2: Add associated types to the
DataSet
traitThe interface can be written in a nicer way using associated types and constants. However, the trait is no longer object safe. We'd have to build an enum with all possible implementors of
DataSet
so that those can be used within an Estimator inside aVec
.Implementation (click)
Estimator
.An implementation for the vapor pressure could look like this:
Implementation (click)
For more complex predictions, e.g.
BinaryVleChemicalPotential
this could look like so:click
Option 3: Add parallel version of
predict
leaving the implementation to the userDataSet
traitThe text was updated successfully, but these errors were encountered: