Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_hpoa is failing, but only on Python 3.9 #95

Open
caufieldjh opened this issue Oct 9, 2024 · 0 comments
Open

test_hpoa is failing, but only on Python 3.9 #95

caufieldjh opened this issue Oct 9, 2024 · 0 comments

Comments

@caufieldjh
Copy link
Member

test_hpoa fails, but not consistently (at least once before I've been able to clear it after re-running the test) and (usually?) only on Python 3.9.

Full log result below:

================================== FAILURES ===================================
_______________________________ test_hpoa[True] ________________________________

group_by_publication = True

    @pytest.mark.parametrize("group_by_publication", [True, False])
    def test_hpoa(group_by_publication):
        wrapper = HPOAWrapper(group_by_publication=group_by_publication)
        with open(INPUT_DIR / "example-phenotype-hpoa.tsv") as file:
>           vars = list(wrapper.objects_from_file(file))

tests/wrappers/test_hpoa.py:19: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/curategpt/wrappers/clinical/hpoa_wrapper.py:119: in objects_from_file
    yield from self.objects_from_rows(rows)
src/curategpt/wrappers/clinical/hpoa_wrapper.py:100: in objects_from_rows
    pubs = self.pubmed_wrapper.objects_by_ids(pmids)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PubmedWrapper(source_locator=None, local_store=None, extractor=None, eutils_client=None, session=<CachedSession(cache=...ngs=CacheSettings(expire_after=-1))>, where=None, email=None, ncbi_key=None, is_fetch_full_text=None, _uses_cache=True)
object_ids = ['PMID:33743206']

    def objects_by_ids(self, object_ids: List[str]) -> List[Dict]:
        pubmed_ids = sorted([x.replace("PMID:", "") for x in object_ids])
        session = self.session
        logger.debug(f"Using session: {session} [cached: {self._uses_cache} for {pubmed_ids}")
    
        # Parameters for the efetch request
        efetch_params = {
            "db": "pubmed",
            "id": ",".join(pubmed_ids),  # Combine PubMed IDs into a comma-separated string
            "rettype": "medline",
            "retmode": "text",
        }
        efetch_response = session.get(EFETCH_URL, params=efetch_params)
        if not self._uses_cache or not efetch_response.from_cache:
            # throttle if not using cache or if not cached
            logger.debug(f"Sleeping for {RATE_LIMIT_DELAY} seconds")
            time.sleep(RATE_LIMIT_DELAY)
        if not efetch_response.ok:
            logger.error(f"Failed to fetch data for {pubmed_ids}")
>           raise ValueError(
                f"Failed to fetch data for {pubmed_ids} using {session} and {efetch_params}"
            )
E           ValueError: Failed to fetch data for ['33743206'] using <CachedSession(cache=<SQLiteCache(name=hpoa_pubmed_cache)>, settings=CacheSettings(expire_after=-1))> and {'db': 'pubmed', 'id': '33743206', 'rettype': 'medline', 'retmode': 'text'}

src/curategpt/wrappers/literature/pubmed_wrapper.py:168: ValueError
----------------------------- Captured stderr call -----------------------------

Downloading hp.db.gz: 0.00B [00:00, ?B/s]
Downloading hp.db.gz:   0%|          | 8.00k/87.2M [00:00<47:10, 32.3kB/s]
Downloading hp.db.gz:   1%|| 1.12M/87.2M [00:00<00:21, 4.24MB/s]
Downloading hp.db.gz:   9%|| 7.99M/87.2M [00:00<00:03, 25.1MB/s]
Downloading hp.db.gz:  18%|█▊        | 16.0M/87.2M [00:00<00:01, 42.5MB/s]
Downloading hp.db.gz:  21%|██▏       | 18.6M/87.2M [00:00<00:02, 30.9MB/s]
Downloading hp.db.gz:  26%|██▌       | 22.3M/87.2M [00:00<00:02, 30.1MB/s]
Downloading hp.db.gz:  28%|██▊       | 24.0M/87.2M [00:01<00:02, 24.3MB/s]
Downloading hp.db.gz:  37%|███▋      | 32.0M/87.2M [00:01<00:01, 34.6MB/s]
Downloading hp.db.gz:  44%|████▍     | 38.3M/87.2M [00:01<00:01, 40.1MB/s]
Downloading hp.db.gz:  46%|████▌     | 40.0M/87.2M [00:01<00:01, 31.8MB/s]
Downloading hp.db.gz:  [53](https://github.com/monarch-initiative/curategpt/actions/runs/11263786466/job/31322911746?pr=94#step:8:54)%|█████▎    | 46.3M/87.2M [00:01<00:01, 31.7MB/s]
Downloading hp.db.gz:  55%|█████▌    | 48.0M/87.2M [00:01<00:01, 24.2MB/s]
Downloading hp.db.gz:  63%|██████▎   | [54](https://github.com/monarch-initiative/curategpt/actions/runs/11263786466/job/31322911746?pr=94#step:8:55).8M/87.2M [00:01<00:01, 33.9MB/s]
Downloading hp.db.gz:  64%|██████▍   | 56.0M/87.2M [00:02<00:01, 28.8MB/s]
Downloading hp.db.gz:  71%|███████▏  | 62.3M/87.2M [00:02<00:00, 36.8MB/s]
Downloading hp.db.gz:  73%|███████▎  | 64.0M/87.2M [00:02<00:00, 31.7MB/s]
Downloading hp.db.gz:  81%|████████  | 70.3M/87.2M [00:02<00:00, 35.8MB/s]
Downloading hp.db.gz:  83%|████████▎ | 72.0M/87.2M [00:02<00:00, 30.6MB/s]
Downloading hp.db.gz:  90%|████████▉ | 78.3M/87.2M [00:02<00:00, 33.6MB/s]
Downloading hp.db.gz:  92%|█████████▏| 80.0M/87.2M [00:02<00:00, 24.4MB/s]
Downloading hp.db.gz:  99%|█████████▉| 86.3M/87.2M [00:03<00:00, 27.9MB/s]
                                                                          
------------------------------ Captured log call -------------------------------
ERROR    curategpt.wrappers.literature.pubmed_wrapper:pubmed_wrapper.py:1[67](https://github.com/monarch-initiative/curategpt/actions/runs/11263786466/job/31322911746?pr=94#step:8:68) Failed to fetch data for ['33743206']
_______________________________ test_hpoa[False] _______________________________

group_by_publication = False

    @pytest.mark.parametrize("group_by_publication", [True, False])
    def test_hpoa(group_by_publication):
        wrapper = HPOAWrapper(group_by_publication=group_by_publication)
        with open(INPUT_DIR / "example-phenotype-hpoa.tsv") as file:
>           vars = list(wrapper.objects_from_file(file))

tests/wrappers/test_hpoa.py:19: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/curategpt/wrappers/clinical/hpoa_wrapper.py:119: in objects_from_file
    yield from self.objects_from_rows(rows)
src/curategpt/wrappers/clinical/hpoa_wrapper.py:100: in objects_from_rows
    pubs = self.pubmed_wrapper.objects_by_ids(pmids)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PubmedWrapper(source_locator=None, local_store=None, extractor=None, eutils_client=None, session=<CachedSession(cache=...ngs=CacheSettings(expire_after=-1))>, where=None, email=None, ncbi_key=None, is_fetch_full_text=None, _uses_cache=True)
object_ids = ['PMID:33743206']

    def objects_by_ids(self, object_ids: List[str]) -> List[Dict]:
        pubmed_ids = sorted([x.replace("PMID:", "") for x in object_ids])
        session = self.session
        logger.debug(f"Using session: {session} [cached: {self._uses_cache} for {pubmed_ids}")
    
        # Parameters for the efetch request
        efetch_params = {
            "db": "pubmed",
            "id": ",".join(pubmed_ids),  # Combine PubMed IDs into a comma-separated string
            "rettype": "medline",
            "retmode": "text",
        }
        efetch_response = session.get(EFETCH_URL, params=efetch_params)
        if not self._uses_cache or not efetch_response.from_cache:
            # throttle if not using cache or if not cached
            logger.debug(f"Sleeping for {RATE_LIMIT_DELAY} seconds")
            time.sleep(RATE_LIMIT_DELAY)
        if not efetch_response.ok:
            logger.error(f"Failed to fetch data for {pubmed_ids}")
>           raise ValueError(
                f"Failed to fetch data for {pubmed_ids} using {session} and {efetch_params}"
            )
E           ValueError: Failed to fetch data for ['33743206'] using <CachedSession(cache=<SQLiteCache(name=hpoa_pubmed_cache)>, settings=CacheSettings(expire_after=-1))> and {'db': 'pubmed', 'id': '33743206', 'rettype': 'medline', 'retmode': 'text'}

src/curategpt/wrappers/literature/pubmed_wrapper.py:1[68](https://github.com/monarch-initiative/curategpt/actions/runs/11263786466/job/31322911746?pr=94#step:8:69): ValueError
------------------------------ Captured log call -------------------------------
ERROR    curategpt.wrappers.literature.pubmed_wrapper:pubmed_wrapper.py:167 Failed to fetch data for ['33743206']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant