Skip to content

Code repository for the publication. Contains MATLAB codes and sequence data used for evolutionary analysis.

Notifications You must be signed in to change notification settings

vembha/TCS_crosstalk_evolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Code repository for the publication:

An evolutionary paradigm favoring crosstalk between bacterial two-component signaling systems

Bharadwaj Vemparala, Arjun Valiya Parambathu, Deepak Kumar Saini, Narendra M Dixit

Numerical packages used: MATLAB® 2020b Update 7; MEGA (Molecular Evolutionary Genetics Analysis) 7.0



There are three folders named fitness_estimation, wright_fisher, and genetic_analyses, wherein each of them contains the files to:

  1. fitness_estimation: estimate the fitness for various values of γ and for all phenotypes possible.
  2. wright_fisher: perform the Wright-Fisher evolution simulations for a given mutation rate, using the fitness values estimated in the previous case, to identify the dominating phenotype.
  3. genetic_analyses: construct the phylogenetic trees for histidine kinases (HKs) and response regulators (RRs) as well as to estimate the ratio of non-synonymous to synonymous mutations (KA/KS) between the selected neighbors.

The first two folders have two subfolders namely random_environment and programmed_environment, and as the names identify, the files are separately stored for random environment case, where the signals are elicited in random fashion, and for programmed environment case, where we considered the strict sequence in signals given by 1, 2,... N, with N being the total number of two-component signaling systems (TCSs). Contents in these folders are described below, while the workflow to obtain the figures of the manuscript is mentioned in the respective folders:

1. fitness_estimation

The file data_set.m is coded with the differential equations of the model, where evaluator.m is the file that is called for various values of N, γ, and phenotype for estimation of fitness values. K_matrix_assignment.m gives out the interaction matrix for every phenotype (refer to Fig. 2A for the example of N = 2 case of the manuscript) which is identified by a unique integer. Finally, bossfile.m is the main file which can be run to generate fitness data for a given N and γ. Data for the cases N = 2, 3, and 4 are provided in the folder.

The above explained files are for both kinds of environments, but additionally, the random environment case contains an extra non_path_signal_sequence.m file to generate all the possible signal sequences, and the fitness of a bacterium in that random environment is the average of fitness values estimated for all possible sequences.

Also, although not used during calculations, the file K_matrix_inverse.m, for a given value of N, γ, and the matrix structure (refer to Fig. 5B of the manuscript) gives out the unique phenotype ID.

2. wright_fisher

The fitness values estimated previously shall be used in the evolution simulations, procedure mentioned in the manuscript. For each kind of environment, as segregated by subfolders, the main file for homogeneous and mixed population initial conditions are separately given as uniform_evolution.m and distributed_evolution.m respectively. The baseline fitness values are the control fitnesses which decide whether the bacteria, in each generation, die or replicate.

3. genetic_analyses

There are four .txt documents which contain the amino acid and nucleotide sequences of HKs and RRs in M. tuberculosis. The amino acid sequences were aligned using Clustal Omega (fullHKalignment.fasta and fullRRalignment.fasta), and then used these alignments to construct phylogenetic trees for HKs and RRs separately using the software package MEGA (version 7) and the resulting trees are HK_full_tree.nwk and RR_full_tree.nwk respectively.

The Excel file domains_of_interest.xlsx contains the starting and ending positions of kinase domains of HKs and receiver domains of RRs (refer to the manuscript on how they were identified) in nucleotide and amino acid sequences. Using this information, the alignments are spliced to extract the kinase domain alignment for HK an dreceiver domain alignment for RR respectively from the full protein alignments mentioned previously. The resulting files are fullHKkinasedomainalignment.fasta and fullRRreceiverdomainalignment.fasta respectively. These domain alignments, which contain amino acids, are converted into respective nucleotide alignments using the nucleotide sequences, and finally the information is used to estimate the KA/KS ratios for the domains of interest

About

Code repository for the publication. Contains MATLAB codes and sequence data used for evolutionary analysis.

Topics

Resources

Stars

Watchers

Forks