Developed by Yousuf A. Khan, Andrew Shu, and Matthew DeButts
This is a repository for the machine learning called aaa2vec, which predicts whether or not a protein belongs to the AAA protein family. This algorithm relies primarily on the word2vec algorithm developed by Google and has been implemented for the purposes of protein family prediction.
A Python 3 notebook for the code is provided, along with a FASTA file containing all the positive traininge examples for proteins in the AAA+ superfamily. A file with a link to the uniprot database fasta file is also included (the file is not included due to space limitations.
Please contact [email protected] if there are any questions.