Merge pull request #56 from DSilva27/new_cmd_tools

New cmd tools to generate models for simulation and training
flatironinstitute · Apr 3, 2024 · c2cd8bf · c2cd8bf
2 parents f4c35bb + c66b3fb
commit c2cd8bf
Show file tree

Hide file tree

Showing 4 changed files with 88 additions and 3 deletions.
diff --git a/README.rst b/README.rst
@@ -32,7 +32,7 @@ You can create an environment for example with conda using the following command
 After creating the virtual environment, you should install the required dependencies and the module.
 
 Dependencies
-~~~~~~~~~~~~
+------------
 
 1. `Lampe <https://lampe.readthedocs.io/en/stable/>`_.
 2. `SciPy <https://scipy.org/>`_.
@@ -42,13 +42,13 @@ Dependencies
 6. `mrcfile <https://pypi.org/project/mrcfile/>`_.
 
 Download this repository
-~~~~~~~~~~~~~~~~~~~~~~~~
+------------------------
 .. code:: bash
 
     git clone `https://github.com/DSilva27/cryo_em_SBI.git`
 
 Navigate to the cloned repository and install the module
-~~~~~~~~~~~~~~~~~~
+--------------------------------------------------------
 .. code:: bash
     
     cd cryo_em_SBI
@@ -62,6 +62,20 @@ Tutorial
 An introduction tutorial can be found at `tutorials/tutorial.ipynb`. In this tutorial, we go through the whole process of making models for cryoSBI, training an amortized posterior, and analyzing the results.
 In the following section, I highlighted cryoSBI key features.
 
+Generate model file to simulate cryo-EM particles
+-------------------------------------------------
+To generate a model file for simulating cryo-EM particles with the simulator provided in this module, you can use the command line tool `models_to_tensor`.
+You will need either a set of pdbs which are indexd or a trr trejectory file which contians all models. The tool will generate a model file that can be used to simulate cryo-EM particles.
+
+.. code:: bash
+
+    models_to_tensor \
+        --model_file path_to_models/pdb_{}.pdb \
+        --output_file path_to_output_file/output.pt \
+        --n_pdbs 100
+
+The output file will be a Pytorch tensor with the shape (number of models, 3, number of pseudo atoms).
+
 Simulating cryo-EM particles
 -----------------------------
 To simulate cryo-EM particles, you can use the CryoEmSimulator class. The class takes in a simulation config file and simulates cryo-EM particles based on the parameters specified in the config file.

diff --git a/pyproject.toml b/pyproject.toml
@@ -28,3 +28,4 @@ dependencies = [
 
 [project.scripts]
 train_npe_model = "cryo_sbi.inference.command_line_tools:cl_npe_train_no_saving"
+model_to_tensor = "cryo_sbi.utils.command_line_tools:cl_models_to_tensor"
diff --git a/src/cryo_sbi/utils/command_line_tools.py b/src/cryo_sbi/utils/command_line_tools.py
@@ -0,0 +1,29 @@
+import argparse
+from cryo_sbi.utils.generate_models import models_to_tensor
+
+
+def cl_models_to_tensor():
+    cl_parser = argparse.ArgumentParser(
+        description="Convert models to tensor for cryoSBI",
+        epilog="pdb-files: The name for the pdbs must contain a {} to be replaced by the index of the pdb file. The index starts at 0. \
+        For example protein_{}.pdb. trr-files: For .trr files you must provide a topology file."
+    )
+    cl_parser.add_argument(
+        "--model_files", action="store", type=str, required=True
+    )
+    cl_parser.add_argument(
+        "--output_file", action="store", type=str, required=True
+    )
+    cl_parser.add_argument(
+        "--n_pdbs", action="store", type=int, required=False, default=None
+    )
+    cl_parser.add_argument(
+        "--top_file", action="store", type=str, required=False, default=None
+    )
+    args = cl_parser.parse_args()
+    models_to_tensor(
+        model_files=args.model_files,
+        output_file=args.output_file,
+        n_pdbs=args.n_pdbs,
+        top_file=args.top_file
+    )
diff --git a/src/cryo_sbi/utils/generate_models.py b/src/cryo_sbi/utils/generate_models.py
@@ -1,3 +1,4 @@
+from typing import Union
 import MDAnalysis as mda
 from MDAnalysis.analysis import align
 import torch
@@ -125,3 +126,43 @@ def traj_parser(top_file: str, traj_file: str, output_file: str) -> None:
         raise ValueError("Model file format not supported. Please use .pt.")
 
     return
+
+
+def models_to_tensor(
+        model_files, 
+        output_file, 
+        n_pdbs: Union[int, None] = None,
+        top_file: Union[str, None] = None,
+    ):
+    """
+    Converts different model files to a torch tensor.
+    
+    Parameters
+    ----------
+    model_files : list
+        A list of model files to convert to a torch tensor.
+        
+    output_file : str
+        The path to the output file. Must be a .pt file.
+        
+    n_models : int
+        The number of models to convert to a torch tensor. Just needed for models in pdb files.
+
+    top_file : str
+        The path to the topology file. Just needed for models in trr files.
+    
+    Returns
+    -------
+        None
+    """
+    assert output_file.endswith("pt"), "The output file must be a .pt file."
+    if model_files.endswith("trr"):
+        assert top_file is not None, "Please provide a topology file."
+        assert n_pdbs is None, "The number of pdb files is not needed for trr files."
+        traj_parser(top_file, model_files, output_file)
+    elif model_files.endswith("pdb"):
+        assert n_pdbs is not None, "Please provide the number of pdb files."
+        assert top_file is None, "The topology file is not needed for pdb files."
+        pdb_parser(model_files, n_pdbs, output_file)
+
+