Running pycistarget error #30

juliasalas01 · 2024-02-13T10:59:23Z

When running pycistarget I get the error:

2024-02-13 11:53:34,200 cisTarget    INFO     Getting cistromes for NC_000022.11
/home/juliasalas/miniconda3/envs/new_conda/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py:301: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
  self.regions_to_db = ctx_db.regions_to_db[self.name] if type(ctx_db.regions_to_db) == dict else ctx_db.regions_to_db.loc[set(coord_to_region_names(self.region_set)) & set(ctx_db.regions_to_db['Target'])]
2024-02-13 11:53:34,309 cisTarget    INFO     Running cisTarget for NC_000023.11 which has 10 regions
2024-02-13 11:53:34,479 cisTarget    INFO     Annotating motifs for NC_000023.11
2024-02-13 11:53:36,078 cisTarget    INFO     Getting cistromes for NC_000023.11
2024-02-13 11:53:36,270 cisTarget    INFO     Done!
2024-02-13 11:53:36,271 pycisTarget_wrapper INFO     /home/juliasalas/piRNA/Workspaces/julia/motifs/CTX_increased_clusters_hum1_All folder already exists.
2024-02-13 11:53:36,736 pycisTarget_wrapper INFO     Running cisTarget without promoters for increased_clusters_hum1
Traceback (most recent call last):
  File "/home/juliasalas/miniconda3/envs/new_conda/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3800, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 203, in pandas._libs.index.IndexEngine._get_loc_duplicates
  File "pandas/_libs/index.pyx", line 211, in pandas._libs.index.IndexEngine._maybe_get_bool_indexer
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index._unpack_bool_indexer
KeyError: 'Query'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "regions.py", line 9, in <module>
    run_pycistarget(region_sets,
  File "/faststorage/project/piRNA/Workspaces/julia/motifs/scenicplus/src/scenicplus/wrappers/run_pycistarget.py", line 224, in run_pycistarget
    db_regions = set(pd.concat([ctx_db.regions_to_db[x] for x in ctx_db.regions_to_db.keys()])['Query'])
  File "/home/juliasalas/miniconda3/envs/new_conda/lib/python3.8/site-packages/pandas/core/series.py", line 982, in __getitem__
    return self._get_value(key)
  File "/home/juliasalas/miniconda3/envs/new_conda/lib/python3.8/site-packages/pandas/core/series.py", line 1092, in _get_value
    loc = self.index.get_loc(label)
  File "/home/juliasalas/miniconda3/envs/new_conda/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    raise KeyError(key) from err
KeyError: 'Query'

Python version: 3.8 also tried with 3.10.13
Pycistarget: 1.0.3.dev2+g81eb875
Pandas: 1.5.0

I am using the motifs from here: https://resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/snapshots/#:~:text=motifs%2Dv10%2Dnr.hgnc%2Dm0.00001%2Do0.0.tbl
https://resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/singletons/

SeppeDeWinter · 2024-02-14T15:12:57Z

Hi @juliasalas01

Can you show the command that you are running?

I suspect that you did not format your region set correctly as it seems that the function is running for each chromosome instead of each region set.

Your regions set dictionary should be a dictionary of dictionaries, see example below:

for key in region_sets.keys():
    print(f'{key}: {region_sets[key].keys()}')

topics_otsu: dict_keys(['Topic1', 'Topic2', 'Topic3', 'Topic4', 'Topic5', 'Topic6', 'Topic7', 'Topic8', 'Topic9', 'Topic10', 'Topic11', 'Topic12', 'Topic13', 'Topic14', 'Topic15', 'Topic16'])
topics_top_3: dict_keys(['Topic1', 'Topic2', 'Topic3', 'Topic4', 'Topic5', 'Topic6', 'Topic7', 'Topic8', 'Topic9', 'Topic10', 'Topic11', 'Topic12', 'Topic13', 'Topic14', 'Topic15', 'Topic16'])
DARs: dict_keys(['B_cells_1', 'B_cells_2', 'CD14+_Monocytes', 'CD4_T_cells', 'CD8_T_cells', 'Dendritic_cells', 'FCGR3A+_Monocytes', 'NK_cells'])

Can you run the same code for you region set and provide the output?

All the best,

Seppe

juliasalas01 · 2024-02-16T14:17:21Z

Hi!
Thank you for your response. My regions input file is a bed file, I have transformed the bed file into a dictionary like this:
{'NC_000001.11': {('6608557', '6636255'): None, ('6623250', '6647368'): None, ('9140669', '9161757'): None, ('9252085', '9280779'): None, ('10364269', '10391240'): None}
But I am not sure this format is compatible.

Thanks

SeppeDeWinter · 2024-02-19T08:41:44Z

Hi @juliasalas01

No that does not look allright.

You can read your bed file like this:

import pyranges as pr
regions = pr.read_bed(<PATH_TO_BED_FILE>)

And produce a regions dictionary like this (in case you only have a single bed file):

region_sets = {
    "set1": {"bed_file1": regions}
}

I hope this helps?

All the best,

Seppe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running pycistarget error #30

Running pycistarget error #30

juliasalas01 commented Feb 13, 2024 •

edited by SeppeDeWinter

Loading

SeppeDeWinter commented Feb 14, 2024

juliasalas01 commented Feb 16, 2024

SeppeDeWinter commented Feb 19, 2024

Running pycistarget error #30

Running pycistarget error #30

Comments

juliasalas01 commented Feb 13, 2024 • edited by SeppeDeWinter Loading

SeppeDeWinter commented Feb 14, 2024

juliasalas01 commented Feb 16, 2024

SeppeDeWinter commented Feb 19, 2024

juliasalas01 commented Feb 13, 2024 •

edited by SeppeDeWinter

Loading