Consider submitting the nearest neighbor bulk modulus predictions to Matbench #4

sgbaird · 2022-07-29T14:21:03Z

Meaning to this leaderboard https://matbench.materialsproject.org/Leaderboards%20Per-Task/matbench_v0.1_matbench_log_kvrh/ specifically. This would require running it using the matbench datasets.

from matbench.bench import MatbenchBenchmark

mb = MatbenchBenchmark(autoload=False, subset=["matbench_log_kvrh"])

for task in mb.tasks:
    task.load()
    for fold in task.folds:

        # Inputs are either chemical compositions as strings
        # or crystal structures as pymatgen.Structure objects.
        # Outputs are either floats (regression tasks) or bools (classification tasks)
        train_inputs, train_outputs = task.get_train_and_val_data(fold)

        # train and validate your model
        my_model.train_and_validate(train_inputs, train_outputs)

        # Get testing data
        test_inputs = task.get_test_data(fold, include_target=False)

        # Predict on the testing data
        # Your output should be a pandas series, numpy array, or python iterable
        # where the array elements are floats or bools
        predictions = my_model.predict(test_inputs)

        # Record your data!
        task.record(fold, predictions)

# Save your results
mb.to_file("my_models_benchmark.json.gz")

(source)

and following the submission instructions.

I think this would really help to put GRID into context with other algorithms and promote awareness. In my experience, it's also been an easy and thorough way of comparing to prior work that I've done.

Related commentary from the manuscript (I didn't notice any references to matbench):

There are several existing literature reports of ML-based prediction of K (Table S2) using different
datasets, input features and machine learning algorithms. Comparison between these studies
is challenging due to the different data sets and metrics used and whether the model is fitted
on a linear or logarithmic scale, but the current state of the art methods using data from the
Materials Project achieve a MAE of around 10 GPa (or 0.05 using log10(K / GPa) as input).

sgbaird mentioned this issue Dec 17, 2022

Consider using dist-matrix #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider submitting the nearest neighbor bulk modulus predictions to Matbench #4

Consider submitting the nearest neighbor bulk modulus predictions to Matbench #4

sgbaird commented Jul 29, 2022 •

edited

Loading

Consider submitting the nearest neighbor bulk modulus predictions to Matbench #4

Consider submitting the nearest neighbor bulk modulus predictions to Matbench #4

Comments

sgbaird commented Jul 29, 2022 • edited Loading

sgbaird commented Jul 29, 2022 •

edited

Loading