Super-RF - C++ program for calculating SRF distance between two phylogenetic trees.
- Overview
- Features
- Building and Running
- Usage
- Custom User Function
- Documentation
- Project Structure
- Contributing
- Contact
This project accomplishes two major tasks:
- extracting phylogenetic tree bipartitions from a Newick format
- calculating the Super-RF distance: a novel way to compare two phylogenetic trees
The Super-RF distance is a new approach to comparing two phylogenetic trees, as described in the article titled article title.
- Calculate SRF between 2 phylogenetic trees
- Verify the triangle inequality for 10 trees
- User-customizable actions for manipulating tree bipartitions, SRF, and more
No external libraries are required to compile this project.
Download the source files and open the folder in your terminal:
- Run
maketo compile the project and create the./SRFexecutable on your machine. - Run
make triangleto compile the project, create the./SRFexecutable, and then test the triangle inequality for a given set of trees. - Run
make testto compile the project, create the./SRFexecutable, and then execute a series of unit tests (for developers).
The program compares two phylogenetic trees, requiring a pair of phylogenetic trees as input.
To use the program effectively:
- Provide a text file containing both trees:
./SRF yourFile.txt - Pass both trees as arguments:
./SRF 'newick1' 'newick2'
An example set of trees is available in the treesExample.txt file.
An example to use the program to calculate SRF between two trees in a txt file :
./SRF data/treesExample.txtThe output will be the following :
Pair of trees used for calculation :
---------------------------------------------------------------------------------------
((1:1,2:2):1,(3:3,4:4):2,(5:5,(6:6,7:7):3):3)
---------------------------------------------------------------------------------------
((((1:1,2:2):1,8:8):1,(5:5,9:9):2):1,(3:3,(10:10,4:4):3):2)
---------------------------------------------------------------------------------------
CALCULATING SRF WITH VALUES :
card(E1/E2) = 2
card(E2/E1) = 3
sum|B1(P) - B2(P)| = 2
card(0) = 1
card(E1 U E2) = 10
=====> SRF = 0.571429Here, we can see the Newick tree format of both trees present in treesExample.txt file. Then, we have the SRF calculation details, with the value for each formula component.
- card(E1/E2) is the cardinal of the symetric difference between two sets E1 and E2.
- sum|B1(P) - B2(P)| is the number of bipartitions of E1 (respectively E2) in T1 (respectively T2),
- that induce the bipartition P of E1 ∩ E2
- card(0) is the number of times a side of a bipartition is null when bipartitions are brought to E1 ∩ E2.
- card(E1 U E2) is the cardinal of E1 U E2.
- The final result is equal to 0.571429
You have the flexibility to implement a custom function and execute it using the provided build system. Follow these steps:
-
Implement Your Function: Inside the
srcdirectory,main.cppfile, implement theuser()function as you wish. -
Run your function: Compile and run your custom function with the command
make user.
Project documentation is available in docs folder.
Set manipulation functions are documented in a static html page in docs/html/index.html.
The algorithm to parse a Newick tree format into a vector<pair<list, list>> is explained in a pdf at docs/BipartitionAlgorithmPresentation.pdf.
The project is organized into 5 folders:
datacontains input data files (text files containing phylogenetic trees)docscontains documentation for the set manipulation libraryincludeincludes header files (.hpp)srcincludes source files (.cpp)testsincludes unit test files
The Super-RF formula is not finalized yet. If you wish to modify it, you can do so by updating the SRF() function in the SRF.cpp file.
The SRF() function can utilize any functions present in the SRF.cpp file. Documentation for these functions is available at docs/html/index.html.
If you have any question or feedback, do not hesitate to reach us at :
- Nadia Tahiri : [email protected]
- Arthur Debeaupte : [email protected]