Skip to content

Commit

Permalink
deploy: bf2b26e
Browse files Browse the repository at this point in the history
  • Loading branch information
zulissimeta committed Jul 9, 2024
1 parent 0ef6752 commit f168547
Show file tree
Hide file tree
Showing 292 changed files with 7,242 additions and 1,716 deletions.
223 changes: 223 additions & 0 deletions _downloads/5fdddbed2260616231dbf7b0d94bb665/train.txt

Large diffs are not rendered by default.

131 changes: 131 additions & 0 deletions _downloads/819e10305ddd6839cd7da05935b17060/mass-inference.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
2024-07-09 14:17:18 (INFO): Running in non-distributed local mode
2024-07-09 14:17:19 (INFO): Project root: /home/runner/work/fairchem/fairchem/src/fairchem
2024-07-09 14:17:20 (INFO): amp: true
cmd:
checkpoint_dir: ./checkpoints/2024-07-09-14-17-36
commit: bf2b26e
identifier: ''
logs_dir: ./logs/tensorboard/2024-07-09-14-17-36
print_every: 10
results_dir: ./results/2024-07-09-14-17-36
seed: 0
timestamp_id: 2024-07-09-14-17-36
version: 0.1.dev1+gbf2b26e
dataset: null
evaluation_metrics:
metrics:
energy:
- mae
forces:
- forcesx_mae
- forcesy_mae
- forcesz_mae
- mae
- cosine_similarity
- magnitude_error
misc:
- energy_forces_within_threshold
primary_metric: forces_mae
gp_gpus: null
gpus: 0
logger: tensorboard
loss_functions:
- energy:
coefficient: 1
fn: mae
- forces:
coefficient: 1
fn: l2mae
model: gemnet_t
model_attributes:
activation: silu
cbf:
name: spherical_harmonics
cutoff: 6.0
direct_forces: true
emb_size_atom: 512
emb_size_bil_trip: 64
emb_size_cbf: 16
emb_size_edge: 512
emb_size_rbf: 16
emb_size_trip: 64
envelope:
exponent: 5
name: polynomial
extensive: true
max_neighbors: 50
num_after_skip: 2
num_atom: 3
num_before_skip: 1
num_blocks: 3
num_concat: 1
num_radial: 128
num_spherical: 7
otf_graph: true
output_init: HeOrthogonal
rbf:
name: gaussian
regress_forces: true
noddp: false
optim:
batch_size: 16
clip_grad_norm: 10
ema_decay: 0.999
energy_coefficient: 1
eval_batch_size: 16
eval_every: 5000
force_coefficient: 1
loss_energy: mae
loss_force: atomwisel2
lr_gamma: 0.8
lr_initial: 0.0005
lr_milestones:
- 64000
- 96000
- 128000
- 160000
- 192000
max_epochs: 80
num_workers: 2
optimizer: AdamW
optimizer_params:
amsgrad: true
warmup_steps: -1
outputs:
energy:
level: system
forces:
eval_on_free_atoms: true
level: atom
train_on_free_atoms: true
slurm: {}
task:
prediction_dtype: float32
test_dataset:
a2g_args:
r_energy: false
r_forces: false
format: ase_db
select_args:
selection: natoms>5,xc=PBE
src: data.db
trainer: ocp
val_dataset: null

2024-07-09 14:17:20 (INFO): rank: 0: Sampler created...
2024-07-09 14:17:20 (INFO): Batch balancing is disabled for single GPU training.
2024-07-09 14:17:20 (INFO): Loading model: gemnet_t
2024-07-09 14:17:22 (INFO): Loaded GemNetT with 31671825 parameters.
2024-07-09 14:17:22 (WARNING): log_summary for Tensorboard not supported
2024-07-09 14:17:22 (INFO): Loading checkpoint from: /tmp/ocp_checkpoints/gndt_oc22_all_s2ef.pt
2024-07-09 14:17:22 (INFO): Overwriting scaling factors with those loaded from checkpoint. If you're generating predictions with a pretrained checkpoint, this is the correct behavior. To disable this, delete `scale_dict` from the checkpoint.
2024-07-09 14:17:22 (WARNING): Scale factor comment not found in model
2024-07-09 14:17:22 (INFO): Predicting on test.
device 0: 0%| | 0/3 [00:00<?, ?it/s]/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch_geometric/data/collate.py:145: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = elem.storage()._new_shared(numel)
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/torch_geometric/data/collate.py:145: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = elem.storage()._new_shared(numel)
device 0: 33%|███████████▋ | 1/3 [00:02<00:04, 2.03s/it]device 0: 67%|███████████████████████▎ | 2/3 [00:04<00:02, 2.51s/it]device 0: 100%|███████████████████████████████████| 3/3 [00:06<00:00, 2.11s/it]device 0: 100%|███████████████████████████████████| 3/3 [00:06<00:00, 2.18s/it]
2024-07-09 14:17:28 (INFO): Writing results to ./results/2024-07-09-14-17-36/ocp_predictions.npz
2024-07-09 14:17:28 (INFO): Total time taken: 6.688116550445557
Elapsed time = 13.0 seconds
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions _sources/autoapi/core/common/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Submodules
/autoapi/core/common/hpo_utils/index
/autoapi/core/common/logger/index
/autoapi/core/common/registry/index
/autoapi/core/common/test_utils/index
/autoapi/core/common/transforms/index
/autoapi/core/common/tutorial_utils/index
/autoapi/core/common/typing/index
Expand Down
6 changes: 3 additions & 3 deletions _sources/autoapi/core/common/logger/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Module Contents
tensorboard, etc.


.. py:method:: watch(model)
.. py:method:: watch(model, log_freq: int = 1000)
:abstractmethod:


Expand Down Expand Up @@ -72,7 +72,7 @@ Module Contents
tensorboard, etc.


.. py:method:: watch(model) -> None
.. py:method:: watch(model, log_freq: int = 1000) -> None
Monitor parameters and gradients.

Expand Down Expand Up @@ -102,7 +102,7 @@ Module Contents
tensorboard, etc.


.. py:method:: watch(model) -> bool
.. py:method:: watch(model, log_freq: int = 1000) -> bool
Monitor parameters and gradients.

Expand Down
88 changes: 88 additions & 0 deletions _sources/autoapi/core/common/test_utils/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
core.common.test_utils
======================

.. py:module:: core.common.test_utils
Classes
-------

.. autoapisummary::

core.common.test_utils.ForkedPdb
core.common.test_utils.PGConfig


Functions
---------

.. autoapisummary::

core.common.test_utils.spawn_multi_process
core.common.test_utils._init_pg_and_rank_and_launch_test


Module Contents
---------------

.. py:class:: ForkedPdb(completekey='tab', stdin=None, stdout=None, skip=None, nosigint=False, readrc=True)
Bases: :py:obj:`pdb.Pdb`


A Pdb subclass that may be used from a forked multiprocessing child
https://stackoverflow.com/questions/4716533/how-to-attach-debugger-to-a-python-subproccess/23654936#23654936

example usage to debug a torch distributed run on rank 0:
if torch.distributed.get_rank() == 0:
from fairchem.core.common.test_utils import ForkedPdb
ForkedPdb().set_trace()


.. py:method:: interaction(*args, **kwargs)
.. py:class:: PGConfig
.. py:attribute:: backend
:type: str


.. py:attribute:: world_size
:type: int


.. py:attribute:: gp_group_size
:type: int
:value: 1



.. py:attribute:: port
:type: str
:value: '12345'



.. py:attribute:: use_gp
:type: bool
:value: True



.. py:function:: spawn_multi_process(config: PGConfig, test_method: callable, *test_method_args: Any, **test_method_kwargs: Any) -> list[Any]
Spawn single node, multi-rank function.
Uses localhost and free port to communicate.

:param world_size: number of processes
:param backend: backend to use. for example, "nccl", "gloo", etc
:param test_method: callable to spawn. first 3 arguments are rank, world_size and mp output dict
:param test_method_args: args for the test method
:param test_method_kwargs: kwargs for the test method

:returns: A list, l, where l[i] is the return value of test_method on rank i


.. py:function:: _init_pg_and_rank_and_launch_test(rank: int, pg_setup_params: PGConfig, mp_output_dict: dict[int, object], test_method: callable, args: list[object], kwargs: dict[str, object]) -> None
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,14 @@ Module Contents
:type load_energy_lin_ref: bool


.. py:method:: _init_gp_partitions(atomic_numbers_full, data_batch_full, edge_index, edge_distance, edge_distance_vec)
Graph Parallel
This creates the required partial tensors for each rank given the full tensors.
The tensors are split on the dimension along the node index using node_partition.



.. py:method:: forward(data)
Expand Down
8 changes: 8 additions & 0 deletions _sources/autoapi/core/models/equiformer_v2/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,14 @@ Package Contents
:type load_energy_lin_ref: bool


.. py:method:: _init_gp_partitions(atomic_numbers_full, data_batch_full, edge_index, edge_distance, edge_distance_vec)
Graph Parallel
This creates the required partial tensors for each rank given the full tensors.
The tensors are split on the dimension along the node index using node_partition.



.. py:method:: forward(data)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,6 @@ Module Contents
:type rescale_factor: float


.. py:method:: forward(atomic_numbers, edge_distance, edge_index)
.. py:method:: forward(atomic_numbers, edge_distance, edge_index, num_nodes, node_offset=0)
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Classes
Module Contents
---------------

.. py:class:: EquiformerV2EnergyTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp')
.. py:class:: EquiformerV2EnergyTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp', gp_gpus=None)
Bases: :py:obj:`fairchem.core.trainers.OCPTrainer`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Classes
Module Contents
---------------

.. py:class:: EquiformerV2ForcesTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp')
.. py:class:: EquiformerV2ForcesTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp', gp_gpus=None)
Bases: :py:obj:`fairchem.core.trainers.OCPTrainer`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Module Contents
:type alpha_drop: float


.. py:method:: forward(x: torch.Tensor, atomic_numbers, edge_distance: torch.Tensor, edge_index)
.. py:method:: forward(x: torch.Tensor, atomic_numbers, edge_distance: torch.Tensor, edge_index, node_offset: int = 0)
.. py:class:: FeedForwardNetwork(sphere_channels: int, hidden_channels: int, output_channels: int, lmax_list: list[int], mmax_list: list[int], SO3_grid, activation: str = 'scaled_silu', use_gate_act: bool = False, use_grid_mlp: bool = False, use_sep_s2_act: bool = True)
Expand Down Expand Up @@ -158,6 +158,6 @@ Module Contents
:type proj_drop: float


.. py:method:: forward(x, atomic_numbers, edge_distance, edge_index, batch)
.. py:method:: forward(x, atomic_numbers, edge_distance, edge_index, batch, node_offset: int = 0)
2 changes: 1 addition & 1 deletion _sources/autoapi/core/trainers/base_trainer/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Classes
Module Contents
---------------

.. py:class:: BaseTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier: str, timestamp_id: str | None = None, run_dir: str | None = None, is_debug: bool = False, print_every: int = 100, seed: int | None = None, logger: str = 'wandb', local_rank: int = 0, amp: bool = False, cpu: bool = False, name: str = 'ocp', slurm=None, noddp: bool = False)
.. py:class:: BaseTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier: str, timestamp_id: str | None = None, run_dir: str | None = None, is_debug: bool = False, print_every: int = 100, seed: int | None = None, logger: str = 'wandb', local_rank: int = 0, amp: bool = False, cpu: bool = False, name: str = 'ocp', slurm=None, noddp: bool = False, gp_gpus: int | None = None)
Bases: :py:obj:`abc.ABC`

Expand Down
4 changes: 2 additions & 2 deletions _sources/autoapi/core/trainers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Classes
Package Contents
----------------

.. py:class:: BaseTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier: str, timestamp_id: str | None = None, run_dir: str | None = None, is_debug: bool = False, print_every: int = 100, seed: int | None = None, logger: str = 'wandb', local_rank: int = 0, amp: bool = False, cpu: bool = False, name: str = 'ocp', slurm=None, noddp: bool = False)
.. py:class:: BaseTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier: str, timestamp_id: str | None = None, run_dir: str | None = None, is_debug: bool = False, print_every: int = 100, seed: int | None = None, logger: str = 'wandb', local_rank: int = 0, amp: bool = False, cpu: bool = False, name: str = 'ocp', slurm=None, noddp: bool = False, gp_gpus: int | None = None)
Bases: :py:obj:`abc.ABC`

Expand Down Expand Up @@ -105,7 +105,7 @@ Package Contents
.. py:method:: save_results(predictions: dict[str, numpy.typing.NDArray], results_file: str | None, keys: collections.abc.Sequence[str] | None = None) -> None
.. py:class:: OCPTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp')
.. py:class:: OCPTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp', gp_gpus=None)
Bases: :py:obj:`fairchem.core.trainers.base_trainer.BaseTrainer`

Expand Down
2 changes: 1 addition & 1 deletion _sources/autoapi/core/trainers/ocp_trainer/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Classes
Module Contents
---------------

.. py:class:: OCPTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp')
.. py:class:: OCPTrainer(task, model, outputs, dataset, optimizer, loss_functions, evaluation_metrics, identifier, timestamp_id=None, run_dir=None, is_debug=False, print_every=100, seed=None, logger='wandb', local_rank=0, amp=False, cpu=False, slurm=None, noddp=False, name='ocp', gp_gpus=None)
Bases: :py:obj:`fairchem.core.trainers.base_trainer.BaseTrainer`

Expand Down
4 changes: 3 additions & 1 deletion _sources/core/fine-tuning/fine-tuning-oxides.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,21 +207,23 @@ yml = generate_yml_config(checkpoint_path, 'config.yml',
'optim.loss_force', # the checkpoint setting causes an error
'dataset', 'test_dataset', 'val_dataset'],
update={'gpus': 1,
'task.dataset': 'ase_db',
'optim.eval_every': 10,
'optim.max_epochs': 1,
'optim.batch_size': 4,
'logger':'tensorboard', # don't use wandb!
# Train data
'dataset.train.src': 'train.db',
'dataset.train.format': 'ase_db',
'dataset.train.a2g_args.r_energy': True,
'dataset.train.a2g_args.r_forces': True,
# Test data - prediction only so no regression
'dataset.test.src': 'test.db',
'dataset.test.format': 'ase_db',
'dataset.test.a2g_args.r_energy': False,
'dataset.test.a2g_args.r_forces': False,
# val data
'dataset.val.src': 'val.db',
'dataset.val.format': 'ase_db',
'dataset.val.a2g_args.r_energy': True,
'dataset.val.a2g_args.r_forces': True,
})
Expand Down
Loading

0 comments on commit f168547

Please sign in to comment.