Skip to content

Latest commit

 

History

History
196 lines (118 loc) · 10.4 KB

code_input_output.md

File metadata and controls

196 lines (118 loc) · 10.4 KB

Input-Output Guide for the PyConforMap module

PyConforMap (csv_file, radius_= 0.1, max_x_val= 3, max_y_val= 30, min_x_val= 0, min_y_val= 0)

PyConforMap is a python class.

This class generates a scatter plot of instantaneous shape ratio (Rs) against relative radius of gyration (Rg/Rgmean), for a given protein/polymer simulation and a Gaussian Walk (GW) simulation. The class can be used to analyze metrics of the scatter plot. A single protein/polymer Rgmean is calculated from the entire protein/polymer simulation dataset, and a single GW Rgmean is calculated from the entire GW simulation dataset. Using the scatter plot, an fC score (a quantity ranging from 0 to 1 that represents conformational diversity) is calculated.

The class requires the pandas, numpy, matplotlib, scipy, itertools, more_itertools and random python packages. They are automatically loaded when the 'pyconformap.py' file is executed, as shown in the 'illustrated_example.ipynb' jupyter notebook.

THE CLASS CODE REQUIRES ONE INPUT FILE: It is a csv file (for a given protein/polymer simulation) with 2 columns. The first column contains Rg2 values and the second column contains Ree2 values. In this (user provided) file, each row represents a protein/polymer conformation snapshot from the simulation. An example input is the 'example_protein.csv' csv file (included with repository). A second csv file, for the reference (GW) simulation, is already included with this repository.

Once the dataset is loaded, it prints the % of protein/polymer points that are close to at least one GW point.

By default, the GW_chainlen100.csv csv file is loaded for use as reference and must be available in the default directory. This file contains a pandas dataframe of shape (n,4) of the simulation (row represents snapshot), first column is chain length, second column is square of radius of gyration (Rg2), third column is square of end-to-end distance (Ree2), and fourth column is relative radius of gyration. n is the number of snapshots.

Input Parameters:

csv_file : a csv file of shape (n,2)
  A csv file containing square of the radius of gyration (Rg2) in the first column and square of the end-to-end distance (Ree2) in the second column. Each row in the file represents a single conformation snapshot for a protein/polymer.

radius_ : float, optional
  The radius to use to count GW snapshots that are close to at least one protein/polymer snapshot, or vice versa. Default 0.1.
  NOTE: All scatter point coordinates are transformed before any such counting is performed.

max_x_val : float, optional
  maximum x-axis limit for scatter plot. Default 3.

max_y_val : float, optional
  maximum y-axis limit to use for scatter plot. Default 30.

min_x_val : float, optional
  minimum x-axis limit for scatter plot. Default 0.

min_y_val : float, optional
  minimum y-axis limit to use for scatter plot. Default 0.

Output after initialization:

Returns the percentage of protein/polymer points that are close to at least one GW point on the scatter plot.

Methods:

Click to Expand

plot_protein_against_GW - generates the 2D scatter plot

PyConforMap.plot_protein_against_GW (protein_label, provided_color= 'magenta')

This method generates a scatter plot of instantaneous shape ratio (Rs) against relative radius of gyration (Rg/Rgmean) for both a protein/polymer and GW. GW points (i.e. reference landscape map) are shown in black by default. If any data point exceeds a default axis limit, axis limit will be automatically readjusted. The fC score is computed and displayed on the plot.

Input Parameters:

protein_label : string
  A string to label the protein points on the scatter plot.
provided_color : string, optional
  The color of the provided protein/polymer points. Default magenta.

An attribute fC_value, containing fC, is assigned once this method is run.

check_boundary - computes % of protein/polymer points within the pre-assigned radius of GW points

PyConforMap.check_boundary ()

This method prints out what % of protein/polymer points are within the pre-provided radius of at least one GW point on the scatter plot. To enable this computation, the coordinates of all points on the scatter plot are temporarily transformed (occurs completely in the background), as Rg/Rgmean and Rs have different ranges.

change_xlim_ylim - update x-axis and y-axis limits

PyConforMap.change_xlim_ylim (min_x_val, min_y_val, max_x_val, max_y_val)

Updates x-axis and y-axis limits of scatter plot.

Input Parameters:

min_x_val : float
  Desired minimum x-axis limit.
min_y_val : float
  Desired minimum y-axis limit.
max_x_val : float
  Desired maximum x-axis limit.
max_y_val : float
  Desired maximum y-axis limit.

vary_protein - plot fC against protein/polymer snapshots

PyConforMap.vary_protein (protein_lab, no_dots = 20)

Generates a plot of fC against number of protein/polymer snapshots.

Input Parameters:

protein_lab : string
  A string to label the protein
no_dots : int
  The number of data points to show on the plot. Default 20. E.g. if simulation has 200,000 snapshots, the x-axis will plot 10,000, 20,000 ... 200,000 and the y-axis will show fC at each of those snapshot counts, if no_dots = 20.

vary_GW_ref - plot fC against GW snapshots

PyConforMap.vary_GW_ref (protein_lab, no_dots = 40)

Generates a plot of fC against number of GW snapshots.

Input Parameters:

protein_lab : string
  A string to label the protein
no_dots : int
  The number of data points to show on the plot. Default 40. E.g. if simulation has 720,000 snapshots, the x-axis will plot 18,000, 36,000 ... 720,000 and the y-axis will show fC at each of those GW snapshot counts, if no_dots = 40.

regenerate_GW_chain - simulates new GW reference chain

PyConforMap.regenerate_GW_chain (chain_length, nosnaps, interval= 1, mu= 0, sigma= 1)

This method simulates an entirely new GW chain to be used as a reference. The simulation is such that each snapshot consists of a polymer chain conformation where the distance of one monomer to the next was randomly selected from a gaussian distribution with mean 0 and standard deviation 1. Also saves this new simulation as the 'current' reference GW simulation (updates the GW_df attribute with new simulation).

Returns a pandas dataframe of shape (n,5) of the simulation (row represents snapshot), first column is chain length, second column is square of radius of gyration (Rg2), third column is square of end-to-end distance (Ree2), fourth column is relative radius of gyration, and fifth column is instantaneous shape ratio. n is the number of snapshots.

Input Parameters:

chain_length : int
  The desired number of monomers of the chain.
nosnaps : int
  The desired number of snapshots in the simulation. Each snapshot is a new randomly generated chain conformation.
interval : int, optional
  The number of simulation steps to go through in-between snapshots. Default 1.
mu : float, optional
  The mean of the gaussian distribution from which to randomly select distance of one monomer to next. Default 0.
sigma : float, optional
  The standard deviation of the gaussian distribution from which to randomly select distance of one monomer to next. Default 1.

Returns:

A pandas dataframe of the new GW simulation.

save_GW_chain_to_csv - save current GW chain data to a csv file

PyConforMap.save_GW_chain_to_csv (direc_and_filename = './GW_chain_simulation.csv')

Saves the current GW reference chain simulation to a csv file. Saves by default to current directory.

direc_and_filename : string, optional
  The directory and filename in which to save the file. Default './GW_chain_simulation.csv'.

retrieve_default_GW_chain - revert to default reference simulation

PyConforMap.retrieve_default_GW_chain ()

Revert to the default GW reference simulation. Re-loads the GW_chainlen100.csv csv file to use as reference.

Attributes:

protein_rg2 : array
  A numpy array of the square of the protein/polymer radius of gyration values.

protein_ree2 : array
  A numpy array of the square of the protein/polymer end-to-end distance values.

protein_rg_mean : float
  The mean of the protein/polymer radius of gyration, computed from all data combined.

GW_df : pandas dataframe of shape (n,5)
  A dataframe of the GW reference simulation, which by default is the provided GW_chainlen100.csv file. The columns, in order, are GW chain length, square of radius of gyration, square of end-to-end distance, relative radius of gyration, and instantaneous shape ratio. Each row represents a conformation snapshot from the GW simulation.

poly_var : pandas dataframe of shape (n,2)
  A dataframe of the protein/polymer simulation. The first column is relative radius of gyration (Rg/Rgmean) and the second column is instantaneous shape ratio. Each row represents a conformation snapshot from the protein/polymer simulation.