Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_imctools.py metal names are expected to be unique, and have issues if the Channel names are protein listed #66

Open
arcolombo opened this issue Aug 2, 2022 · 1 comment

Comments

@arcolombo
Copy link

So this issue is posted on the slack and this is a cross reference slack

The inputs required are a panel.csv with a column for metals that the script run_imctools.py parses and creates a metal dictionary. So the python commands do require the metals to be in the MCD file that are unique. However, in some instances, the IMC technitian can manually provided channel names corresponding to the proteins. The text file here below shows that the Channel names (aSMA) were labeled during ablation, and the label names (Y89Di) are usually always the isotope metals. So the issue happens when channel names are somewhat similar ("CD45", and "CD4") for instance, this will cause an error in the python command .startswith(entry) when attempting to parse the metal dictionary.

I've fixed this issue to resolve my issue, in a hacky way, by using replacing the startswith to endswith to improve the matching.

In other cases, in the test-dataset, ablation files, the channel names correspond to the metal isotopes, and the label names also correspond with the metal isotopes. so in this case, the default run_imctools.py works fine. However, if the channel names are proteins, the current version will not uniquely create the metal dictionary.

the panel.csv (attached) is the format for this type of experiment, and yes this does process.

ISSUE: is there a way that the imcyto pipeline can incorporate this (or a better) run_imctools.py script? as of now i have to run this process using interative containers, and can not run the pipeline as of now.

Start_push End_push Pushes_duration X Y Z CD45(Y89Di) aSMA(In113Di) CD31(In115Di) HLA_ABC(La139Di) CD38(Pr141Di) CD69(Nd142Di) vimentin(Nd143Di) CCR6(Nd144Di) CK19(Nd145Di) tryptase(Nd146Di) CD163(Sm147Di) CXCR3(Nd148Di) PD1(Sm149Di) PDL1(Nd150Di) IL6(Eu151Di) CD11c(Sm152Di) LAG3(Eu153Di) HepPar1(Sm154Di) FOXP3(Gd155Di) CD4(Gd156Di) E_cadherin(Gd158Di) CD68(Tb159Di) GATA3(Gd160Di) CD20(Dy161Di) CD8a(Dy162Di) TIM3(Dy163Di)
test.panel.csv
FAP(Dy164Di) CD138(Ho165Di) iNOS(Er166Di) CD11b(Er167Di) podoplanin(Er168Di) collagen_T1(Tm169Di) CD3(Er170Di) NKG2D(Yb171Di) CD15(Yb172Di) CD45RO(Yb173Di) HLA_DR(Yb174Di) IL10(Lu175Di) CTLA4(Yb176Di) 191Ir(Ir191Di) 193Ir(Ir193Di) granzymeB(Pt195Di) Ki67(Pt198Di) HistoneH3(Bi209Di)
2011 2394 384 0 0 0 0.000 1.000 0.000 2.584 0.000 1.363 78.762 6.578 0.000 0.595 0.000 0.000 3.658 0.000 6.282 0.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000 3.614 1.000 0.000 0.000 0.000 0.000 0.000 4.277 0.000 0.000 1.334 0.000 28.858 0.000 0.000 0.000 2.189 0.000 1.000 1.000
2396 2779 384 1 0 1 0.000 1.723 0.000 2.000 0.000 0.000 64.860 1.277 0.000 0.000 0.000 1.000 0.000 0.000 7.754 0.000 0.000 1.660 0.000 0.000 2.063 2.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000


import os
import sys
import argparse
import re

import imctools.io.mcdparser as mcdparser
import imctools.io.txtparser as txtparser
import imctools.io.ometiffparser as omeparser
import imctools.io.mcdxmlparser as meta

############################################
############################################
## PARSE ARGUMENTS
############################################
############################################

Description = 'Split nf-core/imcyto input data by full/ilastik stack.'
Epilog = """Example usage: python run_imctools.py <MCD/TXT/TIFF> <METADATA_FILE>"""

argParser = argparse.ArgumentParser(description=Description, epilog=Epilog)
argParser.add_argument('INPUT_FILE', help="Input files with extension '.mcd', '.txt', or '.tiff'.")
argParser.add_argument('METADATA_FILE', help="Metadata file containing 3 columns i.e. metal,full_stack,ilastik_stack. See pipeline usage docs for file format information.")
args = argParser.parse_args()

############################################
############################################
## PARSE & VALIDATE INPUTS
############################################
############################################

## READ AND VALIDATE METADATA FILE
ERROR_STR = 'ERROR: Please check metadata file'
HEADER = ['metal', 'full_stack', 'ilastik_stack']

fin = open(args.METADATA_FILE,'r')
header = fin.readline().strip().split(',')
if header != HEADER:
    print("{} header: {} != {}".format(ERROR_STR,','.join(header),','.join(HEADER)))
    sys.exit(1)

metalDict = {}
for line in fin.readlines():
    lspl = line.strip().split(',')
    metal,fstack,istack = lspl

    ## CHECK THREE COLUMNS IN LINE
    if len(lspl) != len(HEADER):
        print("{}: Invalid number of columns - should be 3!\nLine: '{}'".format(ERROR_STR,line.strip()))
        sys.exit(1)

    ## CHECK VALID INCLUDE/EXCLUDE CODES
    if fstack not in ['0','1'] or istack not in ['0','1']:
        print("{}: Invalid column code - should be 0 or 1!\nLine: '{}'".format(ERROR_STR,line.strip()))
        sys.exit(1)

    ## CREATE DICTIONARY
    metal = metal.upper()
    if metal not in metalDict:
        metalDict[metal] = [bool(int(x)) for x in [fstack,istack]]
fin.close()

## OUTPUT FILE LINKING ROI IDS TO ROI LABELS (IMAGE DESCRIPTION)
roi_map = open(os.path.basename(args.INPUT_FILE)+'_ROI_map.csv', "w")

## USE DIFFERENT PARSERS CORRESPONDING TO THE INPUT FILE FORMAT
file_type = re.sub(".*\.([^.]+)$", '\\1', args.INPUT_FILE.lower())

## CONVERT INPUT_FILE TO TIFF AND WRITE RELEVANT TIFF IMAGES
if file_type == "mcd":
    parser = mcdparser.McdParser(args.INPUT_FILE)
    acids = parser.acquisition_ids
else:
    if file_type == "txt":
        parser = txtparser.TxtParser(args.INPUT_FILE)
    elif file_type == "tiff" or file_type == "tif":
        parser = omeparser.OmetiffParser(args.INPUT_FILE)
    else:
        print("{}: Invalid input file type - should be txt, tiff, or mcd!".format(file_type))
        sys.exit(1)

    # THERE IS ONLY ONE ACQUISITION - ROI FOLDER NAMED ACCORDING TO INPUT FILENAME
    acids = [ re.sub('.txt|.tiff', '', os.path.basename(parser.filename).lower().replace(" ", "_")) ]

for roi_number in acids:
    if file_type == "mcd":
        imc_ac = parser.get_imc_acquisition(roi_number)
        acmeta = parser.meta.get_object(meta.ACQUISITION, roi_number)
        roi_label = parser.get_acquisition_description(roi_number)
        roi_map.write("roi_%s,%s,%s,%s" % (roi_number, roi_label, acmeta.properties['StartTimeStamp'], acmeta.properties['EndTimeStamp']) + "\n")
    else:
        imc_ac = parser.get_imc_acquisition()

        # NO INFORMATION ON IMAGE ACQUISITION TIME FOR TXT AND TIFF FILE FORMATS
        roi_map.write("roi_%s,,," % (roi_number) + "\n")

    for i,j in enumerate(HEADER[1:]):
        ## WRITE TO APPROPRIATE DIRECTORY
        dirname = "roi_%s/%s" % (roi_number, j)
        if not os.path.exists(dirname):
            os.makedirs(dirname)

        # SELECT THE METALS FOR THE CORRESPNDING STACK (i) TO CREATE OME TIFF STACK
        label_indices = [ idx for idx in range(0, len(imc_ac.channel_labels)) if len([ entry for entry in metalDict if imc_ac.channel_labels[idx].upper().startswith(entry) and metalDict[entry][i]]) > 0 ]
        metal_stack = [ imc_ac.channel_metals[idx] for idx in label_indices ]

        if len(metal_stack) > 0:
            img = imc_ac.get_image_writer(filename=os.path.join("roi_%s" % (roi_number), "%s.ome.tiff" % j), metals=metal_stack)
            img.save_image(mode='ome', compression=0, dtype=None, bigtiff=True)
        else:
            print("None of the metals exists in metasheet file for {}".format(j))
            sys.exit(1)

        for l, m in zip(imc_ac.channel_labels, imc_ac.channel_metals):
            filename = "%s.tiff" % (l)

            # MATCH METAL LABEL TO METADATA METAL COLUMN
            # metal_label = l.split('_')[0].upper()
            metal_label = l.upper()
            # metal = [ entry for entry in metalDict if metal_label.upper().startswith(entry) and metalDict[entry][i] ]
            metal = [ entry for entry in metalDict if metal_label.upper().endswith(entry) and metalDict[entry][i] ]
            if len(metal) == 1:
                if metalDict[metal[0]][i]:
                    img = imc_ac.get_image_writer(filename=os.path.join(dirname,filename), metals=[m])
                    img.save_image(mode='ome', compression=0, dtype=None, bigtiff=False)
            elif len(metal) > 1:
                print("{} metal has multiple matches found".format(metal_label))
            elif len([ entry for entry in metalDict if metal_label.upper().startswith(entry)]) == 0:
                print("{} metal does not exist in metasheet file".format(metal_label))
roi_map.close()
@arcolombo
Copy link
Author

test.panel.csv

here is the panel for this type of experiment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant