Skip to content
/ CANDy Public

CANDy allows the domain detection and annotation of protein sequences from any CAZy family.

Notifications You must be signed in to change notification settings

PyEED/CANDy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 

Repository files navigation

Carbohydrate Active eNzyme Domain analYsis tool (CANDy) - automated analysis of domain architectures in carbohydrate-active enzymes

CANDy boosts a fast, FAIR and seamless protein domain analysis of any CAZy family. An online version is available on Google Colab, yet for bigger families we recommend you downloading the Jupyter Notebook.

Requirements

Make sure to have following tools installed in the same directory as the Jupyter Notebook:

1. CD-HIT

Download the source code for CD-HIT from the GitHub repository at https://github.com/weizhongli/cdhit/releases and follow the installation instructions.

2. MAFFT

Precompiled binary can be downloaded here: https://mafft.cbrc.jp/alignment/software/. Change the installation directory to the path where the Notebook is stored or manually move the executable from the default directory. You can find the location of MAFFT by typing the following command in your terminal:

where mafft

3. FastTree

MacOS

Follow installation instructions for your system on http://www.microbesonline.org/fasttree/#Install

Or

Open a terminal window and install the Xcode Command Line Tools by typing the following command:

 xcode-select --install

Install Homebrew by typing the following command:

 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install FastTree by typing the following command:

 brew install fasttree

Linux

Follow installation instructions for your system on http://www.microbesonline.org/fasttree/#Install

Or

Download the FastTree source code from the FastTree website at http://www.microbesonline.org/fasttree/.

Open a terminal window and navigate to the directory where you downloaded the source code.

Type the following commands to compile the FastTree package:

tar xvzf FastTree-2.1.13.c
cd FastTree-2.1.13
make

After the compilation process completes, you will find the FastTree executable file in the "FastTree-2.1.13" directory.

Windows

Download the FastTree Windows binary from the FastTree website at http://www.microbesonline.org/fasttree/#Download.

Extract the FastTree executable from the downloaded ZIP file.

Running CANDy

Your directory shoud look like this:

image

This Notebook uses several Python packages. To avoid compatibility issues we recommend running this Notebook in a virtual environment.

  • Therefore, install Anaconda and follow the installation instructions.
  • Go to 'Environments' in Anaconda and click 'create'. Give your environment a name, for example 'myenv'. The virual environment will be launched automatically.
  • Go to the package search bar and search for 'ipywidgets'. Download the package to be able to use the interactive widgets in this Notebook. Repeat for the 'h5py' package.
  • Next, go back to the 'Home' page in Anaconda and install Jupyter Notebook. Once completed, press launch and go to the directory where you saved this Notebook.
  • Verify that you see the name of the virtual enivronment on the right top of the Notebook, for example: Python (myenv). If that's not the case, go to Kernel and choose the environment.

Also, for large families, avoid your computer entering sleep or stand-by mode since this will interupt the run. Change the settings in your computer or caffeinate your system.

MacOS

Install caffeinate package by running in your Terminal:

brew install caffeinate

Start the package by running:

caffeinate -d

Stop by running:

ctrl + C

Linux

Install caffeinate package by running in your Terminal:

sudo apt-get install caffeinate

Start the package by running:

caffeinate -d

Stop by running:

ctrl + C

Windows

Note: The caffeinate package is not available for Windows. However, you can use a similar feature called "powercfg" to prevent the system from going to sleep.

Open the Command Prompt application.

Type following line to see the current power requests:

powercfg /requests

Type following line, followed by the type of request you want to override (e.g., "system" or "display"):

powercfg /requestsoverride

To stop the power request override, type in "powercfg /requestsoverride" followed by the type of request and the "/remove" argument

Output

When running the Google Colab version of CANDy, results containing the FATSA files, SQLite database, MSA, phylogenetic tree (in Newick format) and iTOL annotation files are automatically downloaded in a Zip file. When running CANDy locally, these outputs are stores in the same directory as the Jupyter Notebook.

Database

To open the results in the database, download SQLite from: https://sqlitebrowser.org/

(Annotated) Phylogenetic tree

To view the phylogenetic tree, several free services are available. The Notebook makes use of the ete3 package to visualize the annotated tree in there. For a more interactive experience we recommend iTOL. The script outputs iTOL annotation files for the visualization of the protein domains and the activity of the included characterized sequences.

image

Protein domain co-occurence network

CANDy offers users a co-occurrence network that visually represents both the frequency of different domain types and the degree to which they are interconnected. A simple visualisation is offered in the Notebook, but for a more interactive experience we recommend using Cytoscape (yFiles Organic Layout).

image

Acknowledgements

CANDy communicates with and/or references the following separate libraries, packages and tools:

Legal terms

License and Disclaimer

This Jupyter Notebook is licensed under MIT.

This Notebook and other information provided is for theoretical utilisation only, caution should be exercised in its use. It is provided ‘as-is’ without any warranty of any kind, whether expressed or implied. Information is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice.

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section in the CANDy README may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Databases

The following databases are used by CANDy, and are available with reference to the following:

About

CANDy allows the domain detection and annotation of protein sequences from any CAZy family.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published