Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'source' #2120

Closed
myps6415 opened this issue Aug 21, 2024 · 12 comments
Closed

KeyError: 'source' #2120

myps6415 opened this issue Aug 21, 2024 · 12 comments

Comments

@myps6415
Copy link

Hi, I got the KeyError below.
Is anyone know how to fix it?
Thanks a lot.

poetry run python start_us.py
[2024-08-21 13:25:20] Assigning Jobs
Processing Scraped Posts
  0%|                                                                                             | 0/436 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/ubuntu/work/UltimaScraper/start_us.py", line 62, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/ubuntu/work/UltimaScraper/start_us.py", line 44, in main
    _api = await USR.start(
  File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 50, in start
    await self.start_datascraper(datascraper)
  File "/home/ubuntu/work/UltimaScraper/ultima_scraper/ultima_scraper.py", line 137, in start_datascraper
    await datascraper.datascraper.api.job_manager.process_jobs()
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 45, in process_jobs
    await asyncio.create_task(self.__worker())
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/managers/job_manager/job_manager.py", line 53, in __worker
    await job.task
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 202, in prepare_scraper
    await self.process_scraped_content(
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/modules/module_streamliner.py", line 237, in process_scraped_content
    unrefined_set: list[dict[str, Any]] = await tqdm_asyncio.gather(
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in gather
    res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 79, in <listcomp>
    res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
  File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
    return i, await f
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/datascraper_manager/datascrapers/onlyfans.py", line 51, in media_scraper
    content_metadata.resolve_extractor(Extractor(post_result))
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 216, in resolve_extractor
    self.medias: list[MediaMetadata] = result.get_medias(self)
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_collection/managers/metadata_manager/metadata_manager.py", line 147, in get_medias
    main_url = self.item.url_picker(asset_metadata)
  File "/home/ubuntu/work/UltimaScraper/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py", line 39, in url_picker
    source = media_item["source"]
KeyError: 'source'
@Neurosis404
Copy link

Happens to me too. Worked 3-4 days ago, issue appeared suddenly without any visible reason. Also not model related, tried another one and the error appears there too.

@betoalanis
Copy link

+1

I pick "scrape all" and still, so I can confirm it has nothing to do with a model in specific, I think source just means OF in general.

@gri1n
Copy link

gri1n commented Aug 24, 2024

+1

Is this project event maintained anymore?

@felixtheant
Copy link

Haven't used this in awhile and when I do, I get the same error.

@barthramsay
Copy link

barthramsay commented Aug 26, 2024

Some investigation about general updates (because this codebase is old):

When looking at recent pypi package dependencies where the error happens with version 1.1.4 of ultima-scraper-api

and especially UltimaScraper ITSELF on pypi it seems that the latest UltimaScraper on pypi is newer than what is available in github.

I will investigate further but probably upgrading ultimascraper with last pypi sources will maybe or most likely fix this issue?

The codebase here is outdated with dependencies 2 years old but the pypi one using recent versions from this year from first view.

Interesting links with regular updated codebase (but not UltimaScraper itself somehow):

@barthramsay
Copy link

@DIGITALCRIMINAL you mind either updating this repo or providing us a new updated start_us.py ?

Thank you

@UrsaBear
Copy link

UrsaBear commented Sep 3, 2024

It looks like the data structure that OnlyFans is using has changed.
They removed the source key from the media, which was causing issues with getting the URLs.
Now the source url is in files.full.url.
I made some tweaks to the url_picker method in ultima_scraper_api/apis/onlyfans/__init__.py, now it works.
Here’s the quick fix I did for the url_picker method:

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

@betoalanis
Copy link

betoalanis commented Sep 3, 2024

It looks like the data structure that OnlyFans is using has changed. They removed the source key from the media, which was causing issues with getting the URLs. Now the source url is in files.full.url. I made some tweaks to the url_picker method in ultima_scraper_api/apis/onlyfans/__init__.py, now it works. Here’s the quick fix I did for the url_picker method:

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

I can confirm this is working, TYVM!!

UPDATE: I scrapped an account perfecly, and after that I'm getting a TypeError: argument of type 'NoneType' is not iterable error, so it's failing after one scrapped model after selecting "All", seems to be working correctly when selecting models 1 by 1

ANOTHER UPDATE: the script now seems to be working properly when selecting ALL, maybe some of my models db are corrupted, still testing, but overall this edit works :D

@betoalanis
Copy link

betoalanis commented Sep 4, 2024

Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (#2121 (comment))

in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]

@cigix
Copy link

cigix commented Sep 4, 2024

That got my downloads repaired as well, thanks everyone!

@raphaelbarreto
Copy link

I've tried to replicate the steps but cant make it work. Can anyone upload somewhere a working code version, please?

@myps6415
Copy link
Author

Ok, after some testing, I noticed the error comes from the change from OF on the preview url's and I cross checked (#2121 (comment))

in the same __init__.py file I replaced all the ["preview"] in preview_url_picker for ["full"]

Hi everyone, I think this problem has been solved by everyone and it is worked for me now. I will make a summary here.

You need to fix __init__.py in folder ultima_scraper_api/apis/onlyfans. I think it's not easily to find out because you are in UltimaScraper this project. So, here I write down the full path: UltimaScraper/.venv/lib/python3.11/site-packages/ultima_scraper_api/apis/onlyfans, fix __init__.py here.

The corrected __init__.py is as follows:

from __future__ import annotations

from typing import TYPE_CHECKING, Any, Literal
from urllib.parse import urlparse

SubscriptionType = Literal["all", "active", "expired", "attention"]

if TYPE_CHECKING:
    from ultima_scraper_api.apis.onlyfans.classes.user_model import (
        AuthModel,
        create_user,
    )


class SiteContent:
    def __init__(self, option: dict[str, Any], user: AuthModel | create_user) -> None:
        self.id: int = option["id"]
        self.author = user
        self.media: list[dict[str, Any]] = option.get("media", [])
        self.preview_ids: list[int] = []
        self.__raw__ = option

    def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
        authed = self.get_author().get_authed()
        video_quality = (
            video_quality or self.author.get_api().get_site_settings().video_quality
        )
        if not media_item["canView"]:
            return
        source: dict[str, Any] = {}
        media_type: str = ""
        if "files" in media_item:
            media_type = media_item["type"]
            media_item = media_item["files"]
            source = media_item["full"]
        else:
            return
        url = source.get("url")
        return urlparse(url) if url else None

    def preview_url_picker(self, media_item: dict[str, Any]):
        preview_url = None
        if "files" in media_item:
            if (
                "preview" in media_item["files"]
                and "url" in media_item["files"]["full"]
            ):
                preview_url = media_item["files"]["full"]["url"]
        else:
            preview_url = media_item["full"]
            return urlparse(preview_url) if preview_url else None

    def get_author(self):
        return self.author

    async def refresh(self):
        func = await self.author.scrape_manager.handle_refresh(self)
        return await func(self.id)

Another thing is if you run this project by docker before, you need to rebuild your image and remember to put the fixed __init__.py in to right place. So I put my Dockerfile bellow:

FROM python:3.10-slim
RUN apt-get update && apt-get install -y \
  curl \
  libpq-dev \
  gcc \
  && rm -rf /var/lib/apt/lists/*
WORKDIR /usr/src/app
ENV POETRY_HOME=/usr/local/share/pypoetry
ENV POETRY_VIRTUALENVS_CREATE=false
RUN curl -sSL https://install.python-poetry.org | python3 -

COPY . .

RUN /usr/local/share/pypoetry/bin/poetry install --only main

COPY .venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py /usr/src/app/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/onlyfans/__init__.py

CMD [ "/usr/local/share/pypoetry/bin/poetry", "run", "python", "./start_us.py" ]

After those settings, I think you can run it well.
In my experience, after all settings, "KeyError: 'data'" appeared because new cookie needs to setting. You need to reset auth.json in __user_data__/profiles/OnlyFans/default/auth.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants