Numerical packages used: MATLAB® 2020b Update 7; MEGA (Molecular Evolutionary Genetics Analysis) 7.0
There are three folders named fitness_estimation
, wright_fisher
, and genetic_analyses
, wherein each of them contains the files to:
fitness_estimation
: estimate the fitness for various values of γ and for all phenotypes possible.wright_fisher
: perform the Wright-Fisher evolution simulations for a given mutation rate, using the fitness values estimated in the previous case, to identify the dominating phenotype.genetic_analyses
: construct the phylogenetic trees for histidine kinases (HKs) and response regulators (RRs) as well as to estimate the ratio of non-synonymous to synonymous mutations (KA/KS) between the selected neighbors.
The first two folders have two subfolders namely random_environment
and programmed_environment
, and as the names identify, the files are separately stored for random environment case, where the signals are elicited in random fashion, and for programmed environment case, where we considered the strict sequence in signals given by 1, 2,... N, with N being the total number of two-component signaling systems (TCSs). Contents in these folders are described below, while the workflow to obtain the figures of the manuscript is mentioned in the respective folders:
The file data_set.m
is coded with the differential equations of the model, where evaluator.m
is the file that is called for various values of N, γ, and phenotype for estimation of fitness values. K_matrix_assignment.m
gives out the interaction matrix for every phenotype (refer to Fig. 2A for the example of N = 2 case of the manuscript) which is identified by a unique integer. Finally, bossfile.m
is the main file which can be run to generate fitness data for a given N and γ. Data for the cases N = 2, 3, and 4 are provided in the folder.
The above explained files are for both kinds of environments, but additionally, the random environment case contains an extra non_path_signal_sequence.m
file to generate all the possible signal sequences, and the fitness of a bacterium in that random environment is the average of fitness values estimated for all possible sequences.
Also, although not used during calculations, the file K_matrix_inverse.m
, for a given value of N, γ, and the matrix structure (refer to Fig. 5B of the manuscript) gives out the unique phenotype ID.
The fitness values estimated previously shall be used in the evolution simulations, procedure mentioned in the manuscript. For each kind of environment, as segregated by subfolders, the main file for homogeneous and mixed population initial conditions are separately given as uniform_evolution.m
and distributed_evolution.m
respectively. The baseline fitness values are the control fitnesses which decide whether the bacteria, in each generation, die or replicate.
There are four .txt
documents which contain the amino acid and nucleotide sequences of HKs and RRs in M. tuberculosis. The amino acid sequences were aligned using Clustal Omega (fullHKalignment.fasta
and fullRRalignment.fasta
), and then used these alignments to construct phylogenetic trees for HKs and RRs separately using the software package MEGA (version 7) and the resulting trees are HK_full_tree.nwk
and RR_full_tree.nwk
respectively.
The Excel file domains_of_interest.xlsx
contains the starting and ending positions of kinase domains of HKs and receiver domains of RRs (refer to the manuscript on how they were identified) in nucleotide and amino acid sequences. Using this information, the alignments are spliced to extract the kinase domain alignment for HK an dreceiver domain alignment for RR respectively from the full protein alignments mentioned previously. The resulting files are fullHKkinasedomainalignment.fasta
and fullRRreceiverdomainalignment.fasta
respectively. These domain alignments, which contain amino acids, are converted into respective nucleotide alignments using the nucleotide sequences, and finally the information is used to estimate the KA/KS ratios for the domains of interest