Skip to content

seshadrilab/tcrconvert

Repository files navigation

TCRconvert

Warning: This project is in beta stage. It is under active development and may be unstable.

codecov tests Documentation Status

Convert human T-cell receptor (TCR) annotations between 10X, Adaptive, and IMGT formats.

The naming conventions for T-cell receptor (TCR) genes differ between sequencing platforms and the IMGT reference. For example, the naming of TCR alpha chain variable gene segment 1-2 allele 1:

  • 10X: TRAV1-2
  • Adaptive: TCRAV01-02*01
  • IMGT: TRAV1-2*01

TCRconvert enhances TCR dataset interoperability by providing reliable format conversion across 10X, Adaptive, and IMGT-formatted data. Unlike existing tools that limit conversions to only two formats or require custom objects, TCRconvert works directly with data frames. TCRconvert saves researchers time and prevents errors from manual conversion.

TCRconvert takes a Pandas DataFrame with at least one column of gene names as input. It produces a Pandas DataFrame with converted gene names as output.

For full documentation, visit tcrconvert.readthedocs.io

Installation

TCRconvert runs on Windows, macOS, and Linux and requires python >=3.9 and pandas >= 1.5.0.

You can install from GitHub using pip:

pip install git+https://github.com/seshadrilab/tcrconvert

Or clone this repo and from the top-level folder run:

pip install .

The lookup tables for translating gene names come pre-built from IMGT fasta files located under tcrconvert/data/

Basic usage

Load some 10X data

import tcrconvert
import pandas as pd

tcr_file = '/Users/emmabishop/workspace/tcrconvert/tcrconvert/examples/example_10x.csv'

tcrs = pd.read_csv(tcr_file)[['barcode', 'v_gene' , 'd_gene', 'j_gene', 'c_gene', 'cdr3']]
tcrs
barcode v_gene d_gene j_gene c_gene cdr3
0 AAACCTGAGACCACGA-1 TRAV1-2 TRBD1 TRAJ12 TRAC CAVMDSSYKLIF
1 AAACCTGAGACCACGA-1 TRBV6-1 TRBD2 TRBJ2-1 TRBC2 CASSGLAGGYNEQFF
2 AAACCTGAGGCTCTTA-1 TRBV6-4 TRBD2 TRBJ2-3 TRBC2 CASSGVAGGTDTQYF
3 AAACCTGAGGCTCTTA-1 TRAV1-2 TRBD1 TRAJ33 TRAC CAVKDSNYQLIW
4 AAACCTGAGTGAACGC-1 TRBV2 TRBD1 TRBJ1-2 TRBC1 CASNQGLNYGYTF

Convert gene names from the 10X format to the Adaptive format

new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='adaptive')
new_tcrs
Warning: Converting from 10X which lacks allele info. Choosing *01 as allele for all genes.
Warning: Adaptive only captures VDJ genes, any C genes will become NA.
barcode v_gene d_gene j_gene c_gene cdr3
0 AAACCTGAGACCACGA-1 TCRAV01-02*01 TCRBD01-01*01 TCRAJ12-01*01 <NA> CAVMDSSYKLIF
1 AAACCTGAGACCACGA-1 TCRBV06-01*01 TCRBD02-01*01 TCRBJ02-01*01 <NA> CASSGLAGGYNEQFF
2 AAACCTGAGGCTCTTA-1 TCRBV06-04*01 TCRBD02-01*01 TCRBJ02-03*01 <NA> CASSGVAGGTDTQYF
3 AAACCTGAGGCTCTTA-1 TCRAV01-02*01 TCRBD01-01*01 TCRAJ33-01*01 <NA> CAVKDSNYQLIW
4 AAACCTGAGTGAACGC-1 TCRBV02-01*01 TCRBD01-01*01 TCRBJ01-02*01 <NA> CASNQGLNYGYTF

Contributing

I welcome feedback! If you would like to resolve an issue or add improvements please submit a pull request.

Issues

If you run into problems or need help running TCRconvert please file an issue on GitHub.

Contact

For other questions please contact Emma Bishop: emmab5 at uw dot edu

About

Convert TCR gene names between 10X, Adaptive, and IMGT formats

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published