Skip to content

LMSE/Seq-to-Func-Learning-Marjan

Repository files navigation

Sequence-to-Function-Learning

This repo contains work on sequence-to-function learning based on language models.

The model was trained on 1000 PafA single amino acid mutants (obtained from "Revealing enzyme functional architecture via high-throughput microfluidic enzyme kinetics") and its prediction performance was also shown.

Dataset Contains

Kinetic Params Self-Attention kmer-KNN
kcat_cMUP R = 0.64 rho = 0.65 R = 0.350 rho = 0.362
KM_cMUP R = 0.450 rho = 0.62 R = 0.223 rho = 0.067
kcatOverKM_cMUP R = ? rho = ? R = ? rho = ?
kcatOverKM_MeP R = ? rho = ? R = ? rho = ?
kcatOverKM_MecMUP R = ? rho = ? R = ? rho = ?
Ki_Pi R = ? rho = ? R = ? rho = ?

Prediction Pipeline

Prediction Performance

kcat

KM

Make a $y$ vs. $\hat{y}$ plot.

from ZX01_PLOT import *
reg_scatter_distn_plot(y_pred_valid,
                        y_real_valid,
                        fig_size        =  (10,8),
                        marker_size     =  35,
                        fit_line_color  =  "brown",
                        distn_color_1   =  "gold",
                        distn_color_2   =  "lightpink",
                        # title         =  "Predictions vs. Actual Values\n R = " + \
                        #                         str(round(r_value,3)) + \
                        #                         ", Epoch: " + str(epoch+1) ,
                        title           =  "",
                        plot_title      =  "R = " + str(round(r_value,3)) + \
                                                  "\nEpoch: " + str(epoch+1) ,
                        x_label         =  "Actual Values",
                        y_label         =  "Predictions",
                        cmap            =  None,
                        cbaxes          =  (0.425, 0.055, 0.525, 0.015),
                        font_size       =  18,
                        result_folder   =  results_sub_folder,
                        file_name       =  output_file_header + "_VA_" + "epoch_" + str(epoch+1),
                        ) #For checking predictions fittings.


Pipeline.

  • M00_Data_PafAVariants_Prep.py : format the dataset and write to a file..
  • N00_Data_Preprocessing.py : preprocess, prepare the data.
  • N03_LM_Embeddings.py : get sequence embeddings.
  • N05A_SQembCNN_y.py : train the model and evaluate.

About

Seq-to-Func-Learning-Marjan

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published