Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider submitting the nearest neighbor bulk modulus predictions to Matbench #4

Open
sgbaird opened this issue Jul 29, 2022 · 0 comments

Comments

@sgbaird
Copy link

sgbaird commented Jul 29, 2022

Matbench

Meaning to this leaderboard https://matbench.materialsproject.org/Leaderboards%20Per-Task/matbench_v0.1_matbench_log_kvrh/ specifically. This would require running it using the matbench datasets.

from matbench.bench import MatbenchBenchmark

mb = MatbenchBenchmark(autoload=False, subset=["matbench_log_kvrh"])

for task in mb.tasks:
    task.load()
    for fold in task.folds:

        # Inputs are either chemical compositions as strings
        # or crystal structures as pymatgen.Structure objects.
        # Outputs are either floats (regression tasks) or bools (classification tasks)
        train_inputs, train_outputs = task.get_train_and_val_data(fold)

        # train and validate your model
        my_model.train_and_validate(train_inputs, train_outputs)

        # Get testing data
        test_inputs = task.get_test_data(fold, include_target=False)

        # Predict on the testing data
        # Your output should be a pandas series, numpy array, or python iterable
        # where the array elements are floats or bools
        predictions = my_model.predict(test_inputs)

        # Record your data!
        task.record(fold, predictions)

# Save your results
mb.to_file("my_models_benchmark.json.gz")

(source)

and following the submission instructions.

I think this would really help to put GRID into context with other algorithms and promote awareness. In my experience, it's also been an easy and thorough way of comparing to prior work that I've done.

Related commentary from the manuscript (I didn't notice any references to matbench):

There are several existing literature reports of ML-based prediction of K (Table S2) using different
datasets, input features and machine learning algorithms. Comparison between these studies
is challenging due to the different data sets and metrics used and whether the model is fitted
on a linear or logarithmic scale, but the current state of the art methods using data from the
Materials Project achieve a MAE of around 10 GPa (or 0.05 using log10(K / GPa) as input).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant