mSOM

mSOM (metaboliteSOM)

John Pearce

MSc Applied Bioinformatics 2023
Cranfield University

Objectives

Investigate the suitability of conventional clustering methods (K-means) for metabolic time-series data
Apply and optimise SOMs for the same data set
Compare K-means with a Self-Organizing Map (SOM)
Evaluate the appropriateness of conventional measures to assess cluster quality
Investigate the use of additional biological information (common metabolic pathways) to classify types of clusters.

Workflow

The workflow is illustrated below. It consists of a Conventional Processing Workflow and a SOM Clustering & Biological Connectivity Workflow. The former provides basic data validation and then performs data exploration (elbow graphs, clustering metrics, K-means and PCA). The latter provides SOM clustering (using minisom package) and Biological Connectivity (using additional data on common metabolic pathways).

Data sets

A time-series metabolic data set (Arabidopis thaliana) was obtained from Kim, Jae Kwang & Cho, Myoung & Baek, Hyung-Jin & Ryu, Tae & Yu, Chang & Kim, Myong & Fukusaki, Eiichiro & Kobayashi, Akio. (2007). Analysis of metabolite profile data using batch-learning self-organizing maps. Journal of Plant Biology. 50. 517-521. 10.1007/BF03030693.

Any other metabolic time-series data file can be used. The format is a CSV file with a metabolite per row, and each row contains a number of readings at specified points in time.

Additional biological data (common metabolic pathways) was obtained (for Arabidopsis thaliana) from the BioCyC database using the PythonCyC package.

A Data folder contains the required data files. Including the additional biological data if you don't want to install and run Pathway-tools (part of the BioCyC server).

Jupyter Notebooks

The Conventional Processing Workflow is implemented in DataExploration.ipny and Advanced KmeansAnalysis.ipny

The SOM clustering and Biological Connectivity workflow is implemented in SOMClustering.ipny, CommonPathwayData.ipny and BiologicalConnectivity.ipny

Additional comments are provided in the respective .ipny files.

BioCyC

If you want to run CommonPathway.ipny you will need to get a licence for BioCyC from https://biocyc.org/download-bundle.shtml and install it locally (or on a server). You'll also need to download the the PGDB file for Arabidopsis thaliana (or any other species). Instructions for installing pathway-tools can be found at https://biocyc.org/download.shtml

Alternatively the common pathway data file for Arabidopsis thaliana is included in the Data folder.

Packages

SOMs are implemented using the minisom package (https://github.com/JustGlowing/minisom).

A significant number of Python packages are used so it is recommended that a virtual environment like Anaconda is used. A YAML file (environment.yaml) is provided in the repository to help with the construction of an appropriate environment. The appropriate environment can then be created with the command conda env create -f environment.yml

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Data		Data
__pycache__		__pycache__
.DS_Store		.DS_Store
AdvancedKmeansAnalysis.ipynb		AdvancedKmeansAnalysis.ipynb
BiologicalConnectivity.ipynb		BiologicalConnectivity.ipynb
CommonPathwayData.ipynb		CommonPathwayData.ipynb
DataExploration.ipynb		DataExploration.ipynb
README.md		README.md
SOMClustering.ipynb		SOMClustering.ipynb
environment.yaml		environment.yaml
minisom.py		minisom.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mSOM

Objectives

Workflow

Data sets

Jupyter Notebooks

BioCyC

Packages

About

Releases

Packages

Languages

jp-cranfield/mSOM

Folders and files

Latest commit

History

Repository files navigation

mSOM

Objectives

Workflow

Data sets

Jupyter Notebooks

BioCyC

Packages

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages