Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced download functionality for yt-dlp unsupported links #363

Merged
merged 6 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 23 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,24 @@

[![docker image](https://github.com/tgbot-collection/ytdlbot/actions/workflows/builder.yaml/badge.svg)](https://github.com/tgbot-collection/ytdlbot/actions/workflows/builder.yaml)

YouTube Download Bot🚀🎬⬇️
**YouTube Download Bot🚀🎬⬇️**

This Telegram bot allows you to download videos from YouTube and other supported websites, including Instagram!
This Telegram bot allows you to download videos from YouTube and [other supported websites](#supported-websites).

# Usage

[https://t.me/benny_ytdlbot](https://t.me/benny_ytdlbot)

Join Telegram Channel https://t.me/+OGRC8tp9-U9mZDZl for updates.

Send link directly to the bot. Any
Websites [supported by yt-dlp](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md) will work too.
Just send a link directly to the bot.

# Limitations of my bot
# Supported websites

Due to limitations on servers and bandwidth, there are some restrictions on this free service.

* Each user is limited to 10 free downloads per 24-hour period
* Maximum of three subscriptions allowed for YouTube channels.
* Files bigger than 2 GiB will require at least 1 download token.

If you need more downloads, you can buy download tokens.
* YouTube 😅
* Any websites [supported by yt-dlp](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
* Instagram (Videos, Photos, Reels, IGTV & carousel)
* Pixeldrain

# Features

Expand All @@ -42,8 +38,21 @@ If you need more downloads, you can buy download tokens.
13. 4 GiB file size support with Telegram Premium
14. History and inline mode support

> If you download files larger than 2 GiB, you agreed that this file will be uploaded by me. I know who you are and what
> you download.
> [!NOTE]
> **For users of [my official bot](https://t.me/benny_ytdlbot)**\
> Files larger than 2 GiB will be automatically uploaded by me(My Premium Account). By utilizing our service for such downloads, you consent to this process. \
> That means I know who you are and what you download. \
> Rest assured that we handle your personal information with the utmost care.
>
> ## Limitations
> Due to limitations on servers and bandwidth, there are some restrictions on this free service.
> * Each user is limited to 10 free downloads per 24-hour period
> * Maximum of three subscriptions allowed for YouTube channels.
> * Files bigger than 2 GiB will require at least 1 download token.
>
> If you need more downloads, you can buy download tokens.
>
> **Thank you for using the [official bot](https://t.me/benny_ytdlbot).**

# Screenshots

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
pyrogram==2.0.106
tgcrypto==1.2.5
git+https://github.com/yt-dlp/yt-dlp@413d3675804599bc8fe419c19e36490fd8f0b30f
yt-dlp==2024.03.10
APScheduler==3.10.4
beautifultable==1.1.0
ffmpeg-python==0.2.0
Expand Down
20 changes: 3 additions & 17 deletions ytdlbot/downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
from pyrogram import types
from tqdm import tqdm

from sp_downloader import sp_dl

from config import (
AUDIO_FORMAT,
ENABLE_ARIA2,
Expand Down Expand Up @@ -220,7 +222,7 @@ def ytdl_download(url: str, tempdir: str, bm, **kwargs) -> list:
None,
]
adjust_formats(chat_id, url, formats, hijack)
if download_instagram(url, tempdir):
if sp_dl(url, tempdir):
return list(pathlib.Path(tempdir).glob("*"))

address = ["::", "0.0.0.0"] if IPv6 else [None]
Expand Down Expand Up @@ -303,19 +305,3 @@ def split_large_video(video_paths: list):

if split and original_video:
return [i for i in pathlib.Path(original_video).parent.glob("*")]


def download_instagram(url: str, tempdir: str):
if not url.startswith("https://www.instagram.com"):
return False

resp = requests.get(f"http://192.168.6.1:15000/?url={url}").json()
if url_results := resp.get("data"):
for link in url_results:
content = requests.get(link, stream=True).content
ext = filetype.guess_extension(content)
save_path = pathlib.Path(tempdir, f"{id(link)}.{ext}")
with open(save_path, "wb") as f:
f.write(content)

return True
124 changes: 124 additions & 0 deletions ytdlbot/sp_downloader.py
BennyThink marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#!/usr/local/bin/python3
# coding: utf-8

# ytdlbot - sp_downloader.py
# 3/16/24 16:32
#

__author__ = "Benny <[email protected]>, SanujaNS <[email protected]>"
SanujaNS marked this conversation as resolved.
Show resolved Hide resolved

import pathlib
import logging
import traceback
import re
import requests
from tqdm import tqdm
import json
from bs4 import BeautifulSoup
from urllib.parse import parse_qs, urlparse
import filetype
import yt_dlp as ytdl

from config import (
ENABLE_ARIA2,
IPv6,
)

user_agent = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.128 Safari/537.36"
)


def sp_dl(url: str, tempdir: str):
"""Specific link downloader"""
domain = urlparse(url).hostname
if not any(
x in domain
for x in [
"www.instagram.com",
"pixeldrain.com",
"mediafire.com",
]
):
return False
if "www.instagram.com" in domain:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
www.instagram.com
may be at an arbitrary position in the sanitized URL.
return instagram(url, tempdir)
elif "pixeldrain.com" in domain:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
pixeldrain.com
may be at an arbitrary position in the sanitized URL.
return pixeldrain(url, tempdir)
elif "www.xasiat.com" in domain:
Dismissed Show dismissed Hide dismissed
return xasiat(url, tempdir)


def sp_ytdl_download(url: str, tempdir: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code in this function appears to be much duplicated comparing with the one in downloader.py. Anyway we can avoid those duplicates?

Copy link
Collaborator Author

@SanujaNS SanujaNS Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's the same function striped some parts. It would be great if we can call the function in downloader.py but I can't think of a way without it looping itself because some URLs(even tho it's direct URL) include domain parts used in handling function(sp_dl).

output = pathlib.Path(tempdir, "%(title).70s.%(ext)s").as_posix()
ydl_opts = {
"outtmpl": output,
"restrictfilenames": False,
"quiet": True,
}
if ENABLE_ARIA2:
ydl_opts["external_downloader"] = "aria2c"
ydl_opts["external_downloader_args"] = [
"--min-split-size=1M",
"--max-connection-per-server=16",
"--max-concurrent-downloads=16",
"--split=16",
]
formats = [
# webm , vp9 and av01 are not streamable on telegram, so we'll extract mp4 and not av01 codec
"bestvideo[ext=mp4][vcodec!*=av01][vcodec!*=vp09]+bestaudio[ext=m4a]/bestvideo+bestaudio",
"bestvideo[vcodec^=avc]+bestaudio[acodec^=mp4a]/best[vcodec^=avc]/best",
None,
]

address = ["::", "0.0.0.0"] if IPv6 else [None]
error = None
video_paths = None
for format_ in formats:
ydl_opts["format"] = format_
for addr in address:
# IPv6 goes first in each format
ydl_opts["source_address"] = addr
try:
logging.info("Downloading for %s with format %s", url, format_)
with ytdl.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
video_paths = list(pathlib.Path(tempdir).glob("*"))
break
except Exception:
error = traceback.format_exc()
logging.error("Download failed for %s - %s, try another way", format_, url)
if error is None:
break

if not video_paths:
raise Exception(error)

return video_paths


def instagram(url: str, tempdir: str):
resp = requests.get(f"http://192.168.6.1:15000/?url={url}").json()
if url_results := resp.get("data"):
for link in url_results:
content = requests.get(link, stream=True).content
ext = filetype.guess_extension(content)
save_path = pathlib.Path(tempdir, f"{id(link)}.{ext}")
with open(save_path, "wb") as f:
f.write(content)

return True

def pixeldrain(url: str, tempdir: str):
user_page_url_regex = r'https://pixeldrain.com/u/(\w+)'
match = re.match(user_page_url_regex, url)
if match:
url = 'https://pixeldrain.com/api/file/{}?download'.format(match.group(1))
sp_ytdl_download(url, tempdir)
else:
return url

return True

def xasiat(url: str, tempdir: str):
return False
Loading