Warning: This project is in beta stage. It is under active development and may be unstable.
Convert human T-cell receptor (TCR) annotations between 10X, Adaptive, and IMGT formats.
The naming conventions for T-cell receptor (TCR) genes differ between sequencing platforms and the IMGT reference. For example, the naming of TCR alpha chain variable gene segment 1-2 allele 1:
- 10X: TRAV1-2
- Adaptive: TCRAV01-02*01
- IMGT: TRAV1-2*01
TCRconvert enhances TCR dataset interoperability by providing reliable format conversion across 10X, Adaptive, and IMGT-formatted data. Unlike existing tools that limit conversions to only two formats or require custom objects, TCRconvert works directly with data frames. TCRconvert saves researchers time and prevents errors from manual conversion.
TCRconvert takes a Pandas DataFrame with at least one column of gene names as input. It produces a Pandas DataFrame with converted gene names as output.
For full documentation, visit tcrconvert.readthedocs.io
TCRconvert runs on Windows, macOS, and Linux and requires python >=3.9
and pandas >= 1.5.0
.
You can install from GitHub using pip
:
pip install git+https://github.com/seshadrilab/tcrconvert
Or clone this repo and from the top-level folder run:
pip install .
The lookup tables for translating gene names come pre-built from IMGT fasta files located under tcrconvert/data/
Load some 10X data
import tcrconvert
import pandas as pd
tcr_file = '/Users/emmabishop/workspace/tcrconvert/tcrconvert/examples/example_10x.csv'
tcrs = pd.read_csv(tcr_file)[['barcode', 'v_gene' , 'd_gene', 'j_gene', 'c_gene', 'cdr3']]
tcrs
barcode | v_gene | d_gene | j_gene | c_gene | cdr3 | |
---|---|---|---|---|---|---|
0 | AAACCTGAGACCACGA-1 | TRAV1-2 | TRBD1 | TRAJ12 | TRAC | CAVMDSSYKLIF |
1 | AAACCTGAGACCACGA-1 | TRBV6-1 | TRBD2 | TRBJ2-1 | TRBC2 | CASSGLAGGYNEQFF |
2 | AAACCTGAGGCTCTTA-1 | TRBV6-4 | TRBD2 | TRBJ2-3 | TRBC2 | CASSGVAGGTDTQYF |
3 | AAACCTGAGGCTCTTA-1 | TRAV1-2 | TRBD1 | TRAJ33 | TRAC | CAVKDSNYQLIW |
4 | AAACCTGAGTGAACGC-1 | TRBV2 | TRBD1 | TRBJ1-2 | TRBC1 | CASNQGLNYGYTF |
Convert gene names from the 10X format to the Adaptive format
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='adaptive')
new_tcrs
Warning: Converting from 10X which lacks allele info. Choosing *01 as allele for all genes.
Warning: Adaptive only captures VDJ genes, any C genes will become NA.
barcode | v_gene | d_gene | j_gene | c_gene | cdr3 | |
---|---|---|---|---|---|---|
0 | AAACCTGAGACCACGA-1 | TCRAV01-02*01 | TCRBD01-01*01 | TCRAJ12-01*01 | <NA> | CAVMDSSYKLIF |
1 | AAACCTGAGACCACGA-1 | TCRBV06-01*01 | TCRBD02-01*01 | TCRBJ02-01*01 | <NA> | CASSGLAGGYNEQFF |
2 | AAACCTGAGGCTCTTA-1 | TCRBV06-04*01 | TCRBD02-01*01 | TCRBJ02-03*01 | <NA> | CASSGVAGGTDTQYF |
3 | AAACCTGAGGCTCTTA-1 | TCRAV01-02*01 | TCRBD01-01*01 | TCRAJ33-01*01 | <NA> | CAVKDSNYQLIW |
4 | AAACCTGAGTGAACGC-1 | TCRBV02-01*01 | TCRBD01-01*01 | TCRBJ01-02*01 | <NA> | CASNQGLNYGYTF |
I welcome feedback! If you would like to resolve an issue or add improvements please submit a pull request.
If you run into problems or need help running TCRconvert please file an issue on GitHub.
For other questions please contact Emma Bishop: emmab5
at uw
dot edu