deploy: d0fe975

FAIR-Chem · Apr 23, 2024 · bda4922 · bda4922
1 parent b90126e
commit bda4922
Show file tree

Hide file tree

Showing 31 changed files with 2,459 additions and 1,515 deletions.
diff --git a/_downloads/5fdddbed2260616231dbf7b0d94bb665/train.txt b/_downloads/5fdddbed2260616231dbf7b0d94bb665/train.txt
@@ -1,16 +1,16 @@
-2024-04-23 22:36:00 (INFO): Project root: /home/runner/work/ocp/ocp
+2024-04-23 22:52:53 (INFO): Project root: /home/runner/work/ocp/ocp
 /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
   warnings.warn(
-2024-04-23 22:36:01 (INFO): amp: true
+2024-04-23 22:52:55 (INFO): amp: true
 cmd:
-  checkpoint_dir: fine-tuning/checkpoints/2024-04-23-22-36-48-ft-oxides
-  commit: 51c8869
+  checkpoint_dir: fine-tuning/checkpoints/2024-04-23-22-53-52-ft-oxides
+  commit: d0fe975
   identifier: ft-oxides
-  logs_dir: fine-tuning/logs/tensorboard/2024-04-23-22-36-48-ft-oxides
+  logs_dir: fine-tuning/logs/tensorboard/2024-04-23-22-53-52-ft-oxides
   print_every: 10
-  results_dir: fine-tuning/results/2024-04-23-22-36-48-ft-oxides
+  results_dir: fine-tuning/results/2024-04-23-22-53-52-ft-oxides
   seed: 0
-  timestamp_id: 2024-04-23-22-36-48-ft-oxides
+  timestamp_id: 2024-04-23-22-53-52-ft-oxides
 dataset:
   a2g_args:
     r_energy: true
@@ -138,7 +138,7 @@ val_dataset:
     r_forces: true
   src: val.db
 
-2024-04-23 22:36:01 (INFO): Loading dataset: lmdb
+2024-04-23 22:52:55 (INFO): Loading dataset: lmdb
 Traceback (most recent call last):
   File "/home/runner/work/ocp/ocp/main.py", line 89, in <module>
     Runner()(config)

diff --git a/_downloads/819e10305ddd6839cd7da05935b17060/mass-inference.txt b/_downloads/819e10305ddd6839cd7da05935b17060/mass-inference.txt
@@ -1,16 +1,16 @@
-2024-04-23 22:36:52 (INFO): Project root: /home/runner/work/ocp/ocp
+2024-04-23 22:54:40 (INFO): Project root: /home/runner/work/ocp/ocp
 /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
   warnings.warn(
-2024-04-23 22:36:54 (INFO): amp: true
+2024-04-23 22:54:42 (INFO): amp: true
 cmd:
-  checkpoint_dir: ./checkpoints/2024-04-23-22-36-48
-  commit: 51c8869
+  checkpoint_dir: ./checkpoints/2024-04-23-22-53-52
+  commit: d0fe975
   identifier: ''
-  logs_dir: ./logs/tensorboard/2024-04-23-22-36-48
+  logs_dir: ./logs/tensorboard/2024-04-23-22-53-52
   print_every: 10
-  results_dir: ./results/2024-04-23-22-36-48
+  results_dir: ./results/2024-04-23-22-53-52
   seed: 0
-  timestamp_id: 2024-04-23-22-36-48
+  timestamp_id: 2024-04-23-22-53-52
 dataset:
   a2g_args:
     r_energy: false
@@ -117,7 +117,7 @@ test_dataset:
 trainer: ocp
 val_dataset: null
 
-2024-04-23 22:36:54 (INFO): Loading dataset: lmdb
+2024-04-23 22:54:42 (INFO): Loading dataset: lmdb
 Traceback (most recent call last):
   File "/home/runner/work/ocp/ocp/main.py", line 89, in <module>
     Runner()(config)

diff --git a/_images/21ec87dea939698038d174fc4abea95d114033a8fceaec5a9cc470cfb09dddc1.png b/_images/21ec87dea939698038d174fc4abea95d114033a8fceaec5a9cc470cfb09dddc1.png
diff --git a/_images/386d3f90682dc0d722aee2135a57a25ad5a999f4dadec07341e20d974cc98eb3.png b/_images/386d3f90682dc0d722aee2135a57a25ad5a999f4dadec07341e20d974cc98eb3.png
diff --git a/_images/52886ba1b593014126c0a444fb41a6fe83bc9b6aa69d1684b68691c7c30ecd09.png b/_images/52886ba1b593014126c0a444fb41a6fe83bc9b6aa69d1684b68691c7c30ecd09.png
diff --git a/_images/b55bbdaa42bba9c8ff48e92805db1a66241e90cb6687ae87f4752e2790359d0c.png b/_images/b55bbdaa42bba9c8ff48e92805db1a66241e90cb6687ae87f4752e2790359d0c.png
diff --git a/_images/c0032e8faa9d1e3a0702f0cb3a0935cccfadf5531dd0bbe2607e7b5502e1f2ff.png b/_images/c0032e8faa9d1e3a0702f0cb3a0935cccfadf5531dd0bbe2607e7b5502e1f2ff.png
diff --git a/_images/d825621001cb7e5416b157a34368b27dc9e1dd6f4be90ee765912394567c9990.png b/_images/d825621001cb7e5416b157a34368b27dc9e1dd6f4be90ee765912394567c9990.png
diff --git a/_images/dde46624c7cef5811cab2e1bab99fcc550b3d465586e8de10a5e24933e989bff.png b/_images/dde46624c7cef5811cab2e1bab99fcc550b3d465586e8de10a5e24933e989bff.png
diff --git a/_images/f61d848ae0a70aef54c8655318e255e3faafa54c326f632e986a93a17dfed7cd.png b/_images/f61d848ae0a70aef54c8655318e255e3faafa54c326f632e986a93a17dfed7cd.png
diff --git a/_sources/core/inference.md b/_sources/core/inference.md
@@ -14,7 +14,7 @@ kernelspec:
 Fast batched inference
 ------------------
 
-The ASE calculator is not necessarily the most efficient way to run a lot of computations. It is better to do a "mass inference" using a command line utility. We illustrate how to do that here. 
+The ASE calculator is not necessarily the most efficient way to run a lot of computations. It is better to do a "mass inference" using a command line utility. We illustrate how to do that here.
 
 In this paper we computed about 10K different gold structures:
 
@@ -23,12 +23,12 @@ Boes, J. R., Groenenboom, M. C., Keith, J. A., & Kitchin, J. R. (2016). Neural n
 You can retrieve the dataset below. In this notebook we learn how to do "mass inference" without an ASE calculator. You do this by creating a config.yml file, and running the `main.py` command line utility.
 
 ```{code-cell} ipython3
-! wget https://figshare.com/ndownloader/files/11948267 -O data.db 
+! wget https://figshare.com/ndownloader/files/11948267 -O data.db
 ```
 
 
 
-Inference on this file will be fast if we have a gpu, but if we don't this could take a while. To keep things fast for the automated builds, we'll just select the first 10 structures so it's still approachable with just a CPU. 
+Inference on this file will be fast if we have a gpu, but if we don't this could take a while. To keep things fast for the automated builds, we'll just select the first 10 structures so it's still approachable with just a CPU.
 Comment or skip this block to use the whole dataset!
 
 ```{code-cell} ipython3
@@ -46,15 +46,15 @@ with ase.db.connect('full_data.db') as full_db:
 
       if 'tag' in atoms.info['key_value_pairs']:
         atoms.info['key_value_pairs']['tag'] = int(atoms.info['key_value_pairs']['tag'])
-        
+
       subset_db.write(atoms, **atoms.info['key_value_pairs'])
 ```
 
 ```{code-cell} ipython3
 ! ase db data.db
 ```
 
-You have to choose a checkpoint to start with. The newer checkpoints may require too much memory for this environment. 
+You have to choose a checkpoint to start with. The newer checkpoints may require too much memory for this environment.
 
 ```{code-cell} ipython3
 from ocpmodels.models.model_registry import available_pretrained_models
@@ -69,7 +69,7 @@ checkpoint_path
 
 ```
 
-We have to update our configuration yml file with the dataset. It is necessary to specify the train and test set for some reason. 
+We have to update our configuration yml file with the dataset. It is necessary to specify the train and test set for some reason.
 
 ```{code-cell} ipython3
 from ocpmodels.common.tutorial_utils import generate_yml_config
@@ -110,7 +110,7 @@ print(f'Elapsed time = {time.time() - t0:1.1f} seconds')
 
 ```{code-cell} ipython3
 with open('mass-inference.txt', 'wb') as f:
-    f.write(inference.stdout.encode('utf-8')) 
+    f.write(inference.stdout.encode('utf-8'))
 ```
 
 ```{code-cell} ipython3
@@ -148,7 +148,7 @@ energies = np.array([row.energy for row in db.select('natoms>5,xc=PBE')])
 natoms = np.array([row.natoms for row in db.select('natoms>5,xc=PBE')])
 ```
 
-Now, we can see the predictions. The are only ok here; that is not surprising, the data set has lots of Au configurations that have never been seen by this model. Fine-tuning would certainly help improve this.
+Now, we can see the predictions. They are only ok here; that is not surprising, the data set has lots of Au configurations that have never been seen by this model. Fine-tuning would certainly help improve this.
 
 ```{code-cell} ipython3
 import matplotlib.pyplot as plt
@@ -193,11 +193,11 @@ plt.ylabel('OCP (eV/atom)');
 
 # Comparing ASE calculator and main.py
 
-The results should be the same. 
+The results should be the same.
 
 It is worth noting the default precision of predictions is float16 with main.py, but with the ASE calculator the default precision is float32. Supposedly you can specify `--task.prediction_dtype=float32` at the command line to or specify it in the config.yml like we do above, but as of the tutorial this does not resolve the issue.
 
-As noted above (see also [Issue 542](https://github.com/Open-Catalyst-Project/ocp/issues/542)), the ASE calculator and main.py use different precisions by default, which can lead to small differences. 
+As noted above (see also [Issue 542](https://github.com/Open-Catalyst-Project/ocp/issues/542)), the ASE calculator and main.py use different precisions by default, which can lead to small differences.
 
 ```{code-cell} ipython3
 np.mean(np.abs(results['energy'][sind] - OCP * natoms))  # MAE

diff --git a/_sources/core/lmdb_dataset_creation.md b/_sources/core/lmdb_dataset_creation.md
@@ -24,7 +24,7 @@ about these steps as they've been automated as part of this
 
 ```{code-cell} ipython3
 from ocpmodels.preprocessing import AtomsToGraphs
-from ocpmodels.datasets import SinglePointLmdbDataset, TrajectoryLmdbDataset
+from ocpmodels.datasets import LmdbDataset
 import ase.io
 from ase.build import bulk
 from ase.build import fcc100, add_adsorbate, molecule
@@ -149,7 +149,7 @@ db.close()
 ```
 
 ```{code-cell} ipython3
-dataset = SinglePointLmdbDataset({"src": "sample_CuCO.lmdb"})
+dataset = LmdbDataset({"src": "sample_CuCO.lmdb"})
 len(dataset)
 ```
 
@@ -217,7 +217,7 @@ db.close()
 ```
 
 ```{code-cell} ipython3
-dataset = TrajectoryLmdbDataset({"src": "s2ef/"})
+dataset = LmdbDataset({"src": "s2ef/"})
 len(dataset)
 ```
 
@@ -227,7 +227,7 @@ dataset[0]
 
 ### Advanced usage
 
-TrajectoryLmdbDataset supports multiple LMDB files because the need to highly parallelize the dataset construction process. With OCP's largest split containing 135M+ frames, the need to parallelize the LMDB generation process for these was necessary. If you find yourself needing to deal with very large datasets we recommend parallelizing this process.
+LmdbDataset supports multiple LMDB files because the need to highly parallelize the dataset construction process. With OCP's largest split containing 135M+ frames, the need to parallelize the LMDB generation process for these was necessary. If you find yourself needing to deal with very large datasets we recommend parallelizing this process.
 
 +++
 
@@ -236,7 +236,7 @@ TrajectoryLmdbDataset supports multiple LMDB files because the need to highly pa
 Below we demonstrate how to interact with an LMDB to extract particular information.
 
 ```{code-cell} ipython3
-dataset = TrajectoryLmdbDataset({"src": "s2ef/"})
+dataset = LmdbDataset({"src": "s2ef/"})
 ```
 
 ```{code-cell} ipython3

diff --git a/_sources/core/model_training.md b/_sources/core/model_training.md
@@ -25,10 +25,10 @@ python main.py --mode train --config-yml configs/TASK/SIZE/MODEL/MODEL.yml
 If you have multiple
 GPUs, you can use distributed data parallel training by running:
 ```
-python -u -m torch.distributed.launch --nproc_per_node=8 main.py --distributed --num-gpus 8 [...]
+torchrun --standalone --nproc_per_node=8 main.py --distributed --num-gpus 8 [...]
 ```
-`torch.distributed.launch` launches multiple processes for distributed training. For more details, refer to
-https://pytorch.org/docs/stable/distributed.html#launch-utility
+`torchrun` launches multiple processes for distributed training. For more details, refer to the
+[official documentation](https://pytorch.org/docs/stable/elastic/run.html)
 
 If training with multiple GPUs, GPU load balancing may be used to evenly distribute a batch of variable system sizes across GPUs. Load balancing may either balance by number of atoms or number of neighbors. A `metadata.npz` file must be available in the dataset directory to take advantage of this feature. The following command will generate a  `metadata.npz` file and place it in the corresponding directory.
 ```
@@ -39,7 +39,7 @@ Load balancing is activated by default (in atoms mode). To change modes you can
 optim:
   load_balancing: neighbors
 ```
-For more details, refer to https://github.com/Open-Catalyst-Project/ocp/pull/267.
+For more details, refer to [PR 267](https://github.com/Open-Catalyst-Project/ocp/pull/267).
 
 If you have access to a slurm cluster, we use the [submitit](https://github.com/facebookincubator/submitit) package to simplify multi-node distributed training:
 ```
@@ -53,11 +53,10 @@ In the rest of this tutorial, we explain how to train models for each task.
 ## Initial Structure to Relaxed Energy prediction (IS2RE)
 
 In the IS2RE tasks, the model takes the initial structure as an input and predicts the structure’s adsorption energy
-in the relaxed state. To train a model for the IS2RE task, you can use the `EnergyTrainer`
-Trainer and `SinglePointLmdb` dataset by specifying the following in your configuration file:
+in the relaxed state. To train a model for the IS2RE task, you can use the following in your configuration file:
 
 ```yaml
-trainer: energy # Use the EnergyTrainer
+trainer: ocp
 
 dataset:
   # Train data
@@ -130,11 +129,11 @@ Alternatively, the IS2RE task may be approached by 2 methods as described in our
 ## Structure to Energy and Forces (S2EF)
 
 In the S2EF task, the model takes the positions of the atoms as input and predicts the adsorption energy and per-atom
-forces as calculated by DFT. To train a model for the S2EF task, you can use the `ForcesTrainer` Trainer
+forces as calculated by DFT. To train a model for the S2EF task, you can use the `OCPTrainer`
 and `TrajectoryLmdb` dataset by specifying the following in your configuration file:
 
 ```yaml
-trainer: forces  # Use the ForcesTrainer
+trainer: ocp
 
 dataset:
   # Training data
@@ -159,7 +158,7 @@ You can find examples configuration files in [`configs/s2ef`](https://github.com
 To train a SchNet model for the S2EF task on the 2M split using 2 GPUs, run:
 
 ```bash
-python -u -m torch.distributed.launch --nproc_per_node=2 main.py \
+torchrun --standalone --nproc_per_node=2 main.py \
         --mode train --config-yml configs/s2ef/2M/schnet/schnet.yml --num-gpus 2 --distributed
 ```
 Similar to the IS2RE task, tensorboard logs are stored in `logs/tensorboard/[TIMESTAMP]` and the
@@ -175,7 +174,10 @@ The predictions are stored in `[RESULTS_DIR]/ocp_predictions.npz` and later used
 
 ## Training OC20 models with total energies (IS2RE/S2EF)
 
-To train and validate an OC20 IS2RE/S2EF model on total energies instead of adsorption energies there are a number of required changes to the config. They include setting: `dataset: oc22_lmdb`, `prediction_dtype: float32`, `train_on_oc20_total_energies: True`, and `oc20_ref: path/to/oc20_ref.pkl` (see example below). Also, please note that our evaluation server does not currently support OC20 total energy models.
+To train and validate an OC20 IS2RE/S2EF model on total energies instead of adsorption energies there are a number of
+required changes to the config. They include setting: `dataset: oc22_lmdb`, `prediction_dtype: float32`,
+`train_on_oc20_total_energies: True`, and `oc20_ref: path/to/oc20_ref.pkl` (see example below).
+Also, please note that our evaluation server does not currently support OC20 total energy models.
 
 ```yaml
 task:
@@ -278,11 +280,10 @@ EvalAI expects results to be structured in a specific format for a submission to
 
 ## Initial Structure to Total Relaxed Energy (IS2RE-Total)
 
-For the IS2RE-Total task, the model takes the initial structure as input and predicts the total DFT energy of the relaxed structure. This task is more general and more challenging than the original OC20 IS2RE task that predicts adsorption energy. To train an OC22 IS2RE-Total model use the `EnergyTrainer` with the `OC22LmdbDataset` by including these lines in your configuration file:
+For the IS2RE-Total task, the model takes the initial structure as input and predicts the total DFT energy of the relaxed structure. This task is more general and more challenging than the original OC20 IS2RE task that predicts adsorption energy.
+To train an OC22 IS2RE-Total model use the `OC22LmdbDataset` by including these lines in your configuration file:
 
 ```yaml
-trainer: energy # Use the EnergyTrainer
-
 dataset:
   format: oc22_lmdb # Use the OC22LmdbDataset
   ...
@@ -291,11 +292,11 @@ You can find examples configuration files in [`configs/oc22/is2re`](https://gith
 
 ## Structure to Total Energy and Forces (S2EF-Total)
 
-The S2EF-Total task takes a structure and predicts the total DFT energy and per-atom forces. This differs from the original OC20 S2EF task because it predicts total energy instead of adsorption energy. To train an OC22 S2EF-Total model use the ForcesTrainer with the OC22LmdbDataset by including these lines in your configuration file:
+The S2EF-Total task takes a structure and predicts the total DFT energy and per-atom forces. This differs from the
+original OC20 S2EF task because it predicts total energy instead of adsorption energy.
+To train an OC22 S2EF-Total model the OC22LmdbDataset by including these lines in your configuration file:
 
 ```yaml
-trainer: forces  # Use the ForcesTrainer
-
 dataset:
   format: oc22_lmdb # Use the OC22LmdbDataset
   ...
@@ -338,4 +339,3 @@ EvalAI expects results to be structured in a specific format for a submission to
     ```
    Where `file.npz` corresponds to the respective `[s2ef/is2re]_predictions.npz` files generated for the corresponding task. The final submission file will be written to `submission_file.npz` (rename accordingly). The `dataset` argument specifies which dataset is being considered — this only needs to be set for OC22 predictions because OC20 is the default.
 3. Upload `submission_file.npz` to EvalAI.
-
diff --git a/core/fine-tuning/fine-tuning-oxides.html b/core/fine-tuning/fine-tuning-oxides.html
@@ -773,7 +773,7 @@ <h1>Fine tuning a model<a class="headerlink" href="#fine-tuning-a-model" title="
   warnings.warn(
 </pre></div>
 </div>
-<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Elapsed time 67.3 seconds.
+<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Elapsed time 68.1 seconds.
 </pre></div>
 </div>
 <img alt="../../_images/92bd7f94dd548c8cfc2744eb5890cd23fada1ff98e8dc907657e2eb109af0402.png" src="../../_images/92bd7f94dd548c8cfc2744eb5890cd23fada1ff98e8dc907657e2eb109af0402.png" />
@@ -1138,7 +1138,7 @@ <h2>Running the training job<a class="headerlink" href="#running-the-training-jo
 <span class="expanded">Hide code cell output</span>
 </summary>
 <div class="cell_output docutils container">
-<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Elapsed time = 3.9 seconds
+<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>Elapsed time = 4.1 seconds
 </pre></div>
 </div>
 </div>
@@ -1154,7 +1154,7 @@ <h2>Running the training job<a class="headerlink" href="#running-the-training-jo
 </div>
 </div>
 <div class="cell_output docutils container">
-<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&#39;fine-tuning/checkpoints/2024-04-23-22-36-48-ft-oxides&#39;
+<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&#39;fine-tuning/checkpoints/2024-04-23-22-53-52-ft-oxides&#39;
 </pre></div>
 </div>
 </div>
@@ -1204,7 +1204,7 @@ <h2>Running the training job<a class="headerlink" href="#running-the-training-jo
 <span class="g g-Whitespace">    </span><span class="mi">425</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">mode</span><span class="p">):</span>
 <span class="ne">--&gt; </span><span class="mi">426</span>     <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">mode</span><span class="p">))</span>
 
-<span class="ne">FileNotFoundError</span>: [Errno 2] No such file or directory: &#39;fine-tuning/checkpoints/2024-04-23-22-36-48-ft-oxides/checkpoint.pt&#39;
+<span class="ne">FileNotFoundError</span>: [Errno 2] No such file or directory: &#39;fine-tuning/checkpoints/2024-04-23-22-53-52-ft-oxides/checkpoint.pt&#39;
 </pre></div>
 </div>
 </div>