-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YouTube] Throttling parameter decryption is broken, decrypt function is not again fully extracted #902
Comments
I just noticed the same issue. This time regex literals are to blame:
Avoiding these is not as easy as braces in strings. We cant simply treat slashes like quotes, because regex character ranges can have slashes in them. |
At this point, wouldn't it be the best solution to use an actual JavaScript lexer to extract the function? |
Yep, seems the only reasonnable option to me. And I'm pretty sure that functions wil get harder and harder to parse as the time goes on. |
I am currently working on a YouTube downloader/client library in Rust (thats how noticed the issue). fn extract_js_fn(js: &str, name: &str) -> Result<String> {
let scan = ress::Scanner::new(js);
let mut state = 0;
let mut level = 0;
let mut start = 0;
let mut end = 0;
for item in scan {
let it = item?;
let token = it.token;
match state {
// Looking for fn name
0 => {
if token.matches_ident_str(name) {
state = 1;
start = it.span.start;
}
}
// Looking for equals
1 => {
if token.matches_punct(ress::tokens::Punct::Equal) {
state = 2;
} else {
state = 0;
}
}
// Looking for begin/end braces
2 => {
if token.matches_punct(ress::tokens::Punct::OpenBrace) {
level += 1;
} else if token.matches_punct(ress::tokens::Punct::CloseBrace) {
level -= 1;
if level == 0 {
end = it.span.end;
state = 3;
break;
}
}
}
_ => break,
};
}
if state != 3 {
return Err(anyhow!("could not extract js fn"));
}
Ok(js[start..end].to_owned())
} This works fine with the new player.js. https://javadoc.io/doc/org.mozilla/rhino/latest/index.html |
A lexer isn't really needed. The function body can be extracted by carefully keeping track of the quotes and braces. Equivalent code in yt-dlp: https://github.com/yt-dlp/yt-dlp/blob/b76e9cedb33d23f21060281596f7443750f67758/yt_dlp/jsinterp.py#L229-L254 But if your dependency already has a Lexer, ig why not use it |
I now have a working prototype. It is not pretty and definitely needs cleanup, so I have to do that first before I make a PR. I ended up having to copy Rhino's tokenizer class because it is private. The higher-level parser is accessable, but it only parses entire JS documents into syntax trees, which would take too much time. I also found an issue with the Rhino JS interpreter. Version 1.7.14 uses |
The problem described here will also be partially fixed with #882 (comment) |
I think that's a good approach.
It does, but as mentioned by @Theta-Dev, it is unfortunately private, and I don't think we should copy the lexer to our codebase. An alternative is to fork Rhino and make the lexer public. |
Or maybe contribute the changes to Mozilla ;) |
If they would accept it, sure. ;) |
@Theta-Dev are you still rewriting NewPipeExtractor in Rust? Is it public yet? ;-) Sorry for writing this comment here, but since you're not on IRC I didn't know how to write to you otherwise. |
@Stypox yes, RustyPipe is basically finished. You can get it here: https://code.thetadev.de/ThetaDev/rustypipe btw: how can I join you on IRC? |
Check out Contributing.md |
With player
1f7d5369
, the decryption of the throttling parameter fails because the function is not again fully extracted:Left: what is extracted by the extractor; right: the real function
The extractor still works, because this time an exception catch is properly made.
The text was updated successfully, but these errors were encountered: