Merge pull request #47 from knitterAQ/nqs_qchem

Neural Quantum States (NQS) for Quantum Chemistry
sandbox-quantum · Jan 2, 2025 · 716cde7 · 716cde7
2 parents 812c441 + ac2e67e
commit 716cde7
Show file tree

Hide file tree

Showing 28 changed files with 2,833 additions and 0 deletions.
diff --git a/examples/neural_quantum_states/README.md b/examples/neural_quantum_states/README.md
@@ -0,0 +1,54 @@
+# AUTOREGRESSIVE NEURAL QUANTUM STATES FOR QUANTUM CHEMISTRY
+
+This repository contains code jointly developed between the University of Michigan and SandboxAQ to implement the retentive network (RetNet) neural quantum states ansatz outlined in the paper, "Retentive Neural Quantum States: Efficient Ansatze for Ab Initio Quantum Chemistry," by Oliver Knitter, Dan Zhao, James Stokes, Martin Ganahl, Stefan Leichenauer, and Shravan Veerapaneni.
+
+Preprint available on the arXiv: https://arxiv.org/abs/2411.03900
+
+Corresponding Author: Oliver Knitter, [email protected]
+
+This repository is an elaboration/refactoring of the code Tianchen Zhao released (https://github.com/Ericolony/made-qchem) alongside the paper "Scalable neural quantum states architecture for quantum chemistry" (https://arxiv.org/abs/2208.05637). This new code uses neural quantum states (NQS), implemented in PyTorch to calculate electronic ground state energies for second quantized molecular Hamiltonians. Though not exactly a quantum or hybrid algorithm, this workflow makes use of Tangelo (https://github.com/sandbox-quantum/Tangelo/) and PySCF to calculate Jordan--Wigner encodings of the electronic Hamiltonians, along with some classical chemistry benchmark values. These uses of Tangelo are mainly found in the subdirectory src/data/.
+
+The RetNet ansatz implementation was made using the yet-another-retnet repository (https://github.com/fkodom/yet-another-retnet). Other ansatze available are the MADE and Transformer ansatze, which are implemented natively in PyTorch. The Hamiltonian expectation value estimates are calculated following the procedure outlined in Zhao et al.'s paper,  but the modular structure of the code allows for relatively simple plug-and-play implementations of different models and Hamiltonian expectation estimate calculators.
+
+This code will not be actively maintained and is not intended to be the subject of further development by its authors.
+
+## Usage
+
+### 1. Environment Setup
+
+- The environment requirements are available in the nqs-qchem.yml file.
+
+- Make sure your operating system meets the requirements for PyTorch 2.5.1 and CUDA 11.8.
+
+- If setting up the environment through the .yml file does not work, then the primary dependencies are obtained as follows:
+
+    ```
+    conda install python==3.9.18
+    conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
+    conda install tensorboardX
+    conda install pytorch-model-summary
+    conda install yacs
+    pip install yet-another-retnet
+    pip install tangelo-gc
+    pip install --prefer-binary pyscf
+    ```
+
+### 2. Running the Program
+
+To view a list of previously logged molecule names, without running the main program, run
+
+```
+python -m main --list_molecules
+```
+
+If a desired molecule name is not stored by the program, then running the program with the desired name and a corresponding pubchem ID will link the name and ID in the program's internal name storage. The list of orbital basis sets is available within the PySCF documentation. The program itself can be run by executing the following script:
+
+```
+./run.sh
+```
+
+The run script will adapt to the number of GPUs specified for use, but it does so in a naive way. The script presumes that the total number of GPUs used by the program equals the total number of system GPUs available on the current machine, and will call on all those GPUs to run the program. It is the user's responsibility to ensure these parameters are adequate, and to modify them if necessary (to run the program using multiple nodes, for instance).
+Input configuration flags are generally recorded in .yaml files within the directory ./exp_configs/ and are called as needed in the run script. Some configurations are expected to be specified within the run script: hypothetically all configurations may be specified in the run script, which supersedes the .yaml file, but this is not the expected form of use.
+
+### 3. Logging Outputs
+All logged results and outputs produced by the program can be found under './logger/'. Each run of the program updates './logger/results.txt' (which is not automatically erased) with the best result obtained over the specified number of trials. Each run of the program generates its own output directory within './logger/' that stores a copy of the .yaml file containing the input configurations that were used (those not specified by the file are included in the name of the directory). A dictionary containing basic logging information for all trials is saved as 'result.npy', and tensorboard files containing identical information are saved as well. The model corresponding with the best score of all the trials is also saved here. It is possible to specify different save locations for all these data using existing input configurations.
diff --git a/examples/neural_quantum_states/__init__.py b/examples/neural_quantum_states/__init__.py
diff --git a/examples/neural_quantum_states/config.py b/examples/neural_quantum_states/config.py
@@ -0,0 +1,119 @@
+from yacs.config import CfgNode as CN
+# yacs official github page https://github.com/rbgirshick/yacs
+
+_C = CN()
+''' Miscellaneous '''
+_C.MISC = CN()
+# Random seed
+_C.MISC.RANDOM_SEED = 0
+# Logger path
+_C.MISC.DIR = ''
+# Saved models path
+_C.MISC.SAVE_DIR = ''
+# Number of trials
+_C.MISC.NUM_TRIALS = 0
+# DDP seeds
+_C.MISC.SAME_LOCAL_SEED = False
+
+''' Training hyper-parameters '''
+_C.TRAIN = CN()
+# Learning rate
+_C.TRAIN.LEARNING_RATE = 0.0
+# Optimizer weight decay
+_C.TRAIN.WEIGHT_DECAY = 0.0
+# Training optimizer name
+# Available choices: ['adam', 'sgd', 'adadelta', 'adamax', 'adagrad', 'nadam', 'radam', 'adamw']
+_C.TRAIN.OPTIMIZER_NAME = ''
+# Learning rate scheduler name
+# Available choices: ['decay', 'cyclic', 'trap', 'const', 'cosine', 'cosine_warm']
+_C.TRAIN.SCHEDULER_NAME = ''
+# Training batch size
+_C.TRAIN.BATCH_SIZE = 0
+# Number of training epochs
+_C.TRAIN.NUM_EPOCHS = 0
+# How many iterations between recalculation of Hamiltonian energy
+_C.TRAIN.RETEST_INTERVAL = 1
+# Whether to use entropy regularization
+_C.TRAIN.ANNEALING_COEFFICIENT = 0.0
+# Entropy regularization constant schedule type
+_C.TRAIN.ANNEALING_SCHEDULER = 'none'
+# Largest number of samples that each GPU is expected to process at one time
+_C.TRAIN.MINIBATCH_SIZE = 1
+
+''' Model hyper-parameters '''
+_C.MODEL = CN()
+# The name of the model
+# Available choices: ['made', 'transformer', 'retnet']
+_C.MODEL.MODEL_NAME = ''
+# The number of hidden layers in MADE/Phase MLP
+_C.MODEL.MADE_DEPTH = 1
+# Hidden layer size in MADE/Phase MLP
+_C.MODEL.MADE_WIDTH = 64
+# Embedding/Hidden State Dimension for Transformer/RetNet
+_C.MODEL.EMBEDDING_DIM = 32
+# Number of Attention/Retention Heads per Transformer/RetNet layer
+_C.MODEL.ATTENTION_HEADS = 4
+# Feedforward Dimension for Transformer/RetNet
+_C.MODEL.FEEDFORWARD_DIM = 512
+# Number of Transformer/RetNet layers
+_C.MODEL.TRANSFORMER_LAYERS = 1
+# Parameter std initialization
+_C.MODEL.INIT_STD = 0.1
+# Model temperature parameter
+_C.MODEL.TEMPERATURE = 1.0
+
+''' Data hyper-parameters '''
+_C.DATA = CN()
+# Minimum nonunique batch size for autoregressive sampling
+_C.DATA.MIN_NUM_SAMPLES = 1e2
+# Maximum nonunique batch size for autoregressive sampling
+_C.DATA.MAX_NUM_SAMPLES = 1e12
+# Molecule name
+_C.DATA.MOLECULE = ''
+# Pubchem molecular compound CID (only needed if name does not exist in shelf)
+_C.DATA.MOLECULE_CID = 0
+# Basis ['STO-3G', '3-21G', '6-31G', '6-311G*', '6-311+G*', '6-311++G**', '6-311++G(2df,2pd)', '6-311++G(3df,3pd)', 'cc-pVDZ', 'cc-pVDZ-DK', 'cc-pVTZ', 'cc-pVQZ', 'aug-cc-pCVQZ']
+_C.DATA.BASIS = ''
+# Prepare FCI result; not recommended for medium-to-large systems
+_C.DATA.FCI = False
+# Choice of Hamiltonian to use for training: ['adaptive_shadows', 'exact', 'sample']
+_C.DATA.HAMILTONIAN_CHOICE = 'exact'
+# Sample batch size for Hamiltonian sampling
+_C.DATA.HAMILTONIAN_BATCH_SIZE = 10
+# Number of unique samples in Hamiltonian sampling
+_C.DATA.HAMILTONIAN_NUM_UNIQS = 500
+# Probability of resampling estimated Hamiltonian
+_C.DATA.HAMILTONIAN_RESET_PROB = 0.01
+# Number of batches that the flipped input states are split into during local energy calculation
+_C.DATA.HAMILTONIAN_FLIP_BATCHES = 1
+
+''' Evaluation hyper-parameters '''
+_C.EVAL = CN()
+# Loading path of the saved model
+_C.EVAL.MODEL_LOAD_PATH = ''
+# Name of the results logger
+_C.EVAL.RESULT_LOGGER_NAME = './results/results.txt'
+
+''' DistributedDataParallel '''
+_C.DDP = CN()
+# Node number globally
+_C.DDP.NODE_IDX = 0
+# This needs to be explicitly passed in
+_C.DDP.LOCAL_WORLD_SIZE = 1
+# Total number of GPUs
+_C.DDP.WORLD_SIZE = 1
+# Master address for communication
+_C.DDP.MASTER_ADDR = 'localhost'
+# Master port for communication
+_C.DDP.MASTER_PORT = 12355
+
+
+def get_cfg_defaults():
+    """Get a yacs CfgNode object with default values for the project."""
+    # Return a clone so that the defaults will not be altered
+    # This is for the "local variable" use pattern
+    return _C.clone()
+
+# Alternatively, provide a way to import the defaults as
+# a global singleton:
+# cfg = _C  # users can `from config import cfg`
diff --git a/examples/neural_quantum_states/exp_configs/nqs.yaml b/examples/neural_quantum_states/exp_configs/nqs.yaml
@@ -0,0 +1,54 @@
+
+MISC:
+  RANDOM_SEED: 667
+  DIR: ''
+  SAVE_DIR: ''
+  NUM_TRIALS: 1
+  SAME_LOCAL_SEED: 'True'
+
+DDP:
+  NODE_IDX: -1
+  LOCAL_WORLD_SIZE: -1
+  WORLD_SIZE: -1
+  MASTER_ADDR: 'localhost'
+  MASTER_PORT: 12355
+
+TRAIN:
+  OPTIMIZER_NAME: 'adamw'
+  SCHEDULER_NAME: 'cosine_warmup'
+  LEARNING_RATE: 0.0025
+  WEIGHT_DECAY: 0.05
+  NUM_EPOCHS: 100
+  RETEST_INTERVAL: 10
+  BATCH_SIZE: 2000
+  MINIBATCH_SIZE: 2000
+  ANNEALING_COEFFICIENT: 1.0
+  ANNEALING_SCHEDULER: 'quartic'
+
+MODEL:
+  MODEL_NAME: ''
+  MADE_DEPTH: 2
+  MADE_WIDTH: 64
+  EMBEDDING_DIM: 16
+  ATTENTION_HEADS: 2
+  FEEDFORWARD_DIM: 64
+  TRANSFORMER_LAYERS: 1
+  INIT_STD: 0.1
+  TEMPERATURE: 1.0
+
+DATA:
+  MIN_NUM_SAMPLES: 1e2
+  MAX_NUM_SAMPLES: 1e12
+  MOLECULE: 'H2'
+  MOLECULE_CID: 0
+  BASIS: 'sto-3g'
+  FCI: True
+  HAMILTONIAN_CHOICE: 'exact'
+  HAMILTONIAN_FLIP_BATCHES: 8
+  HAMILTONIAN_BATCH_SIZE: 10000
+  HAMILTONIAN_NUM_UNIQS: 1000
+  HAMILTONIAN_RESET_PROB: 0.01
+
+EVAL:
+  MODEL_LOAD_PATH: ''
+  RESULT_LOGGER_NAME: './logger/results.txt'
diff --git a/examples/neural_quantum_states/main.py b/examples/neural_quantum_states/main.py
@@ -0,0 +1,149 @@
+import os
+import argparse
+import shelve
+import logging
+import numpy as np
+import torch.multiprocessing as mp
+
+from config import get_cfg_defaults  # local variable usage pattern
+from src.util import prepare_dirs, set_seed, write_file, folder_name_generator, setup, cleanup
+from src.training.train import train
+
+def run_trials(rank: int, cfg: argparse.Namespace):
+    '''
+    Runs the train function for cfg.MISC.NUM_TRIALS times with identical input configurations (except for the random seed, which is altered).
+    Args:
+        rank: Identifying index of GPU running process
+        cfg: Input configurations
+    Returns:
+        None
+    '''
+    local_rank = rank
+    # set up configurations
+    directory = cfg.MISC.DIR
+    molecule_name = cfg.DATA.MOLECULE
+    num_trials = cfg.MISC.NUM_TRIALS
+    random_seed = cfg.MISC.RANDOM_SEED
+    result_logger_name = cfg.EVAL.RESULT_LOGGER_NAME
+    node_idx = cfg.DDP.NODE_IDX
+    local_world_size = cfg.DDP.LOCAL_WORLD_SIZE
+    world_size = cfg.DDP.WORLD_SIZE
+    global_rank = node_idx * local_world_size + local_rank
+    master_addr = cfg.DDP.MASTER_ADDR
+    master_port = cfg.DDP.MASTER_PORT
+    use_same_local_seed = cfg.MISC.SAME_LOCAL_SEED
+    logging.info(f"Running DDP on rank {global_rank}.")
+    if world_size > 1:
+        setup(global_rank, world_size, master_addr, master_port)
+    if global_rank == 0:
+        prepare_dirs(cfg)
+    best_score = float('inf')*np.ones(3)
+    avg_time_elapsed = 0.0
+    result_dic = {}
+    write_file(result_logger_name, f"=============== {directory.split('/')[-1]} ===============", global_rank)
+    current_model_path = os.path.join(cfg.MISC.SAVE_DIR, 'last_model.pth')
+    best_model_path = os.path.join(cfg.MISC.SAVE_DIR, 'best_model_pth')
+    for trial in range(num_trials):
+        seed = random_seed + trial * 1000
+        # set random seeds
+        set_seed(seed + (0 if use_same_local_seed else global_rank))
+        new_score, time_elapsed, dic = train(cfg, local_rank, global_rank)
+        new_score = np.array(new_score)
+        result_log = f"[{molecule_name}] Score {new_score}, Time elapsed {time_elapsed:.4f}"
+        write_file(result_logger_name, f"Trial - {trial+1}", global_rank)
+        write_file(result_logger_name, result_log, global_rank)
+        if new_score[0] < best_score[0]:
+            best_score = new_score
+            if global_rank == 0:
+                os.system(f'mv "{current_model_path}" "{best_model_path}"')
+        avg_time_elapsed += time_elapsed / num_trials
+        if dic is not None:
+            for key in dic:
+                if key in result_dic:
+                    result_dic[key] = np.concatenate((result_dic[key], np.expand_dims(dic[key], axis=0)), axis=0)
+                else:
+                    result_dic[key] = np.expand_dims(dic[key], axis=0)
+    result_log = f"[{directory.split('/')[-1]}][{molecule_name}] Best Score {best_score}, Time elapsed {avg_time_elapsed:.4f}, over {num_trials} trials"
+    write_file(result_logger_name, result_log, global_rank)
+    if global_rank == 0:
+        np.save(os.path.join(directory, 'result.npy'), result_dic)
+    if world_size > 1:
+        cleanup()
+
+
+def main(cfg: argparse.Namespace):
+    '''
+    Main function for NQS program
+    Args:
+        cfg: input configurations
+    Returns:
+        None
+    '''
+    world_size = cfg.DDP.WORLD_SIZE
+    local_world_size = cfg.DDP.LOCAL_WORLD_SIZE
+    if world_size > 1:
+        mp.spawn(run_trials, args=(cfg,), nprocs=local_world_size, join=True)
+    else:
+        run_trials(0, cfg)
+    logging.info('--------------- Finished ---------------')
+
+
+if __name__ == '__main__':
+    # command line arguments
+    parser = argparse.ArgumentParser(description="Command-Line Options")
+    parser.add_argument(
+        "--config_file",
+        default="",
+        metavar="FILE",
+        help="Path to the yaml config file",
+        type=str,
+    )
+    parser.add_argument(
+        "opts",
+        help="Modify config options using the command-line",
+        default=None,
+        nargs=argparse.REMAINDER,
+    )
+    parser.add_argument(
+        '--list_molecules',
+        action='store_true',
+        default=False,
+        help='List saved molecules (instead of running program)',
+    )
+    args = parser.parse_args()
+    # Remainder of program does not run if program is asked to list molecules
+    if args.list_molecules:
+        molecule_list = 'PubChem IDs stored by program:'
+        with shelve.open('./src/data/pubchem_IDs/ID_data') as pubchem_ids:
+            for name, id in pubchem_ids.items():
+                molecule_list+=('\n{}: {}'.format(name, id))
+        print(molecule_list+'\nStored names can be used as input arguments to access corresponding IDs.')
+    else:
+        # configurations
+        cfg = get_cfg_defaults()
+        cfg.merge_from_file(args.config_file)
+        cfg.merge_from_list(args.opts)
+        # set up directories (cfg.MISC.DIR)
+        log_folder_name = folder_name_generator(cfg, args.opts)
+        if cfg.MISC.DIR == '':
+            cfg.MISC.DIR = './logger/{}'.format(log_folder_name)
+        if cfg.MISC.SAVE_DIR == '':
+            cfg.MISC.SAVE_DIR = './logger/{}'.format(log_folder_name)
+        os.makedirs(cfg.MISC.DIR, exist_ok=True)
+        os.makedirs(cfg.MISC.SAVE_DIR, exist_ok=True)
+        os.system(f'cp "{args.config_file}" "{cfg.MISC.DIR}"')
+        # freeze the configurations
+        cfg.freeze()
+        # set logger
+        logging.root.handlers = []
+        logging.basicConfig(
+            level=logging.INFO,
+            format="%(asctime)s [%(levelname)s] %(message)s",
+            handlers=[
+                logging.FileHandler(os.path.join(cfg.MISC.DIR, 'logging.log')),
+                logging.StreamHandler()
+            ]
+        )
+        # run program
+        main(cfg)
+