Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? [Metadata Fetch: ... failed: unsupported operand type(s) for /: 'NoneType' and 'int'] [hundreds of videos stuck in xklb-metadata.db, painstakingly processed every time, but never downloading?] #189

Open
holta opened this issue Jun 17, 2024 · 10 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@holta
Copy link
Member

holta commented Jun 17, 2024

Trying to download playlist https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M on LRN2 is going extremely slowly. The "Progress" for "Metadata Fetch" has increased by 1 percentage point every minute or so, as if this initial stage alone will take about 100 minutes:

image

The playlist's 35 videos are each about 5-12min long.

Any idea why the "Metadata Fetch" stage is going so slowly? Hopefully the /var/log/xklb.log logs below provide hints to help explain?

root@lrn2:~# tail /var/log/calibre-web.log
[2024-06-17 08:46:27,854]  INFO {cps.web:1230} Serving book: Precious Plastic - Setup a local collectio - One Army
[2024-06-17 09:15:06,680] DEBUG {cps.web:1601} Start video watching for 866
[2024-06-17 09:15:06,724]  INFO {cps.web:1230} Serving book: FORM 1 MATHEMATICS LESSON 30 DESTRIBUTION - ELIMU TV
[2024-06-17 09:48:08,431] DEBUG {cps.web:1665} Selected book is unavailable. File does not exist or is not accessible
[2024-06-17 10:27:08,741] DEBUG {cps.web:1601} Start video watching for 1022
[2024-06-17 10:27:08,792]  INFO {cps.web:1230} Serving book: Precious Plastic - Setup a mix plastic wor - One Army
[2024-06-17 10:31:08,977] DEBUG {cps.services.worker:91} Add Task for user: Admin - Metadata fetch task for https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M
[2024-06-17 10:31:08,993]  INFO {cps.tasks.metadata_extract:145} Starting to fetch metadata for URL: https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M
[2024-06-17 10:31:16,062]  INFO {cps.editbooks:385} Received metadata request: ImmutableMultiDict([('current_user_name', 'Admin'), ('shelf_title', 'Calculus')])
[2024-06-17 10:31:16,113]  INFO {cps.editbooks:374} Shelf Calculus created

root@lrn2:~# tail -3900 /var/log/xklb.log | pastebinit -b dpaste.com
https://dpaste.com/B8FWRG27Y

root@lrn2:~# systemctl status calibre-web
● calibre-web.service - Calibre-Web
     Loaded: loaded (/etc/systemd/system/calibre-web.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-06-16 06:47:55 CDT; 1 day 4h ago
   Main PID: 535 (python3)
      Tasks: 12 (limit: 9453)
     Memory: 424.1M (peak: 1.1G)
        CPU: 2h 4min 37.963s
     CGroup: /system.slice/calibre-web.service
             ├─  535 /usr/local/calibre-web-py3/bin/python3 /usr/local/calibre-web-py3/cps.py -p /library/calibre-web/config/app.db
             ├─84680 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             ├─84691 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             ├─84692 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             ├─84693 /bin/bash /usr/local/bin/lb-wrapper tubeadd "https://www.youtube.com/watch?v=UoxqrUY0mzU"
             └─84694 /root/.local/pipx/venvs/xklb/bin/python /usr/local/bin/lb tubeadd /library/calibre-web/xklb-metadata.db "https://www.youtube.com/watch?v=UoxqrUY0mzU" --force -vv

Jun 16 06:47:55 lrn2 systemd[1]: Started calibre-web.service - Calibre-Web.
@holta holta added bug Something isn't working question Further information is requested labels Jun 17, 2024
@holta holta changed the title Why ~100 minute ETA to complete "Metadata Fetch" stage, for a playlist or 35 short videos? Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? Jun 17, 2024
@holta
Copy link
Member Author

holta commented Jun 17, 2024

Just FYI the "Metadata Fetch" stage has now been running for more than an hour, and its Progress column currently shows 70%.

@holta
Copy link
Member Author

holta commented Jun 17, 2024

@holta holta changed the title Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? [Metadata Fetch: ... failed: unsupported operand type(s) for /: 'NoneType' and 'int'] Jun 17, 2024
@holta
Copy link
Member Author

holta commented Jun 17, 2024

Just FYI regardless of the root cause for the (failed) ~86 minute "Metadata Fetch", videos that keep reappearing in "Tasks" view (e.g. https://www.youtube.com/watch?v=n_iRE9alO44) should be handled, to explain what's happening.

FWIW in this one particular case, browsing to https://www.youtube.com/watch?v=n_iRE9alO44 shows "This live stream recording is not available.", "No views" and "Streamed 1 year ago" — error messages that in an ideal world might become more visible via "Tasks" view?

  1. @deldesir in the short term, can you suggest a manual hack to clear out problematic videos that are "trapped" and keep reappearing in "Tasks" view?
  2. Can we delineate or document what/all kinds of problematic videos keep reappearing in "Tasks" view? (Regardless whether these are merely "visual clutter" — or more serious clogging / blocking of other downloads!)
  3. As soon as we can, educators/families will need clear explanations of what went wrong. (With actionable hints/suggestions in "Tasks" view, to shine a bright light on emerging problems in jargon-free language, wherever we can anyway!)

@deldesir
Copy link
Collaborator

deldesir commented Jun 17, 2024

Can I have access to your xklb-metadata? https://www.youtube.com/watch?v=n_iRE9alO44 looks out-of-norm and might not have its error recorded in media.error.

UPDATE: Please ignore the request. I'll try to download it myself.

@holta
Copy link
Member Author

holta commented Jun 17, 2024

suggest a manual hack to clear out problematic videos that are "trapped" and keep reappearing in "Tasks" view?

Just FYI installing the very latest IIAB Calibre-Web onto LRN2 did not help:

Trying to "Download to IIAB" https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M is currently working through a ~100 minute "Metadata Fetch".

root@lrn2:~# tail -700 /var/log/xklb.log | pastebinit -b dpaste.com
https://dpaste.com/8C28P5R3V

@holta
Copy link
Member Author

holta commented Jun 17, 2024

@deldesir: at first I suspected Calculus playlist https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M might be pulling in more than its official 35 videos?

As if you watch the running of tail -f /var/log/xklb.log growing, you'll see what appears to be many hundreds of videos being polled (a new one every 5-10 seconds) during "Metadata Fetch" — example:

2024-06-17 18:29:32 - [Debug] yt-dlp initialized 0.2313
2024-06-17 18:29:32 - [Info] [youtube] Extracting URL: https://www.youtube.com/watch?v=45lckolNRm8
2024-06-17 18:29:32 - [Info] [youtube] 45lckolNRm8: Downloading webpage
2024-06-17 18:29:33 - [Info] [youtube] 45lckolNRm8: Downloading ios player API JSON
2024-06-17 18:29:36 - [Debug] Importing playlist-less media https://www.youtube.com/watch?v=45lckolNRm8
2024-06-17 18:29:36 - [Debug] media.playlist_media_add 3.7105
2024-06-17 18:29:36 - [Info]
2024-06-17 18:29:36 - [Debug] ydl.extract_info done 0.0167
2024-06-17 18:29:36 - [Info] lb-wrapper's xklb command (tubeadd) completed successfully.
2024-06-17 18:29:37 - [Info] Using yt-dlp 2024.05.27
2024-06-17 18:29:37 - [Info] Running xklb command: lb tubeadd /library/calibre-web/xklb-metadata.db https://www.youtube.com/watch?v=hUiIkiQGiIM --force -vv
2024-06-17 18:29:38 - [Debug] library v2.8.065
2024-06-17 18:29:38 - [Debug] ['/usr/local/bin/lb', 'tubeadd', '/library/calibre-web/xklb-metadata.db', 'https://www.youtube.com/watch?v=hUiIkiQGiIM', '--force', '-vv']
2024-06-17 18:29:38 - [Debug] {'force': True, 'paths': ['https://www.youtube.com/watch?v=hUiIkiQGiIM']}
2024-06-17 18:29:38 - [Debug] {'noprogress': True, 'skip_download': True, 'lazy_playlist': True, 'extract_flat': True, 'no_check_certificate': True, 'ignore_no_formats_error': True, 'skip_playlist_after_errors': 21, 'playlistend': 20000, 'force': True}

But that theory proved false: as https://www.youtube.com/watch?v=45lckolNRm8 is "PARLONS EN Les études à l’étranger. Expérience des anciens étudiants congolais en Chine" from YouTube channel TOP CONGO FM — it seems these hundreds of videos must somehow be stuck within IIAB Calibre-Web — it keeps trying to reprocess or redownload them somehow...

@deldesir
Copy link
Collaborator

A quick way to confirm this is to use a clean xklb-metadata.db.

@holta
Copy link
Member Author

holta commented Jun 17, 2024

A quick way to confirm this is to use a clean xklb-metadata.db.

Good idea. Feel free to run...

systemctl stop calibre-web
cd /library/calibre-web
mv xklb-metadata.db xklb-metadata.db.OLD2
systemctl start calibre-web

...and then try again to "Download to IIAB" https://www.youtube.com/playlist?list=PLybg94GvOJ9ELZEe9s2NXTKr41Yedbw7M on LRN2?

@holta holta changed the title Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? [Metadata Fetch: ... failed: unsupported operand type(s) for /: 'NoneType' and 'int'] Why ~100 minute ETA to complete "Metadata Fetch" stage (seems very slow) for a playlist of 35 short videos? [Metadata Fetch: ... failed: unsupported operand type(s) for /: 'NoneType' and 'int'] [hundreds of videos stuck in xklb-metadata.db, painstakingly processed every time, but never downloading?] Jun 17, 2024
@holta
Copy link
Member Author

holta commented Jun 18, 2024

"Minimal Reproducer" URL?

Any Minimal Reproducer URL that "pollutes" xklb-metadata.db, would be incredibly useful for testing — and to help us all understand this better — if we can identify one or several such URLs! 🙏

@deldesir
Copy link
Collaborator

deldesir commented Jun 18, 2024

I confirm "lb tubeadd" is taking longer to process the videos. I don't see any pollution with irrelevant videos.

Update: pollution happening for me with 3-videos playlist https://www.youtube.com/playlist?list=PLJicmE8fK0Eg8cBExYSr9r3tzg9ZQo3gD. Using a clean db is a temporary measure to get around this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants