Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

noscript #40

Open
mcnesium opened this issue Jan 20, 2022 · 2 comments
Open

noscript #40

mcnesium opened this issue Jan 20, 2022 · 2 comments

Comments

@mcnesium
Copy link

Trying to get a list of currently available Invidious instances, I started doing

curl -s https://redirect.invidious.io | htmlq "noscript"

which gave me a list of all the noscript elements on the page, including the one I was looking for:

<noscript><div class="instances-list"><h2>Available instances</h2><ul class="list"><li><a href="https://invidious.snopyta.org">invidious.snopyta.org</a></li><li><a href="https://yewtu.be">yewtu.be</a></li><li><a href="https://invidious.kavin.rocks">invidious.kavin.rocks</a></li><li><a href="https://invidious-us.kavin.rocks">invidious-us.kavin.rocks</a></li><li><a href="https://invidious-jp.kavin.rocks">invidious-jp.kavin.rocks</a></li><li><a href="https://vid.puffyan.us">vid.puffyan.us</a></li><li><a href="https://invidious.namazso.eu">invidious.namazso.eu</a></li><li><a href="https://inv.riverside.rocks">inv.riverside.rocks</a></li><li><a href="https://vid.mint.lgbt">vid.mint.lgbt</a></li><li><a href="https://invidious.osi.kr">invidious.osi.kr</a></li><li><a href="https://invidio.xamh.de">invidio.xamh.de</a></li><li><a href="https://yt.artemislena.eu">yt.artemislena.eu</a></li></ul></div></noscript>

But when I tried to dig deeper to only get the list of URLs, it only gave me empty results, no matter what I tried:

$~ curl -s https://redirect.invidious.io | htmlq "noscript a"
$~ curl -s https://redirect.invidious.io | htmlq "noscript li"
$~ curl -s https://redirect.invidious.io | htmlq "noscript ul"
$~ curl -s https://redirect.invidious.io | htmlq "noscript div"

Is this an issue with noscript in general or with that specific site? Why does it find what I am looking for in the first place?

Using htmlq 0.4.0 from AUR

@mcnesium
Copy link
Author

I know I can do

curl -s https://api.invidious.io/instances.json | jq -r '.[][1].uri'

because that is where the data from outside the noscript comes from, but this might still be a valid issue.

@eporama
Copy link

eporama commented May 13, 2022

I wonder if this is because this uses servo/html5ever under the hood:

And it looks like that code may have an option for "scripting_enabled" which defaults to true and then makes noscript elements raw data

https://github.com/servo/html5ever/blob/57eb334c0ffccc6f88d563419f0fbeef6ff5741c/html5ever/src/tree_builder/rules.rs#L118-L126

I couldn't see where to set whether not you want to set that to false, but as a complete hack work around, you can do this for now:

$~ curl -s https://redirect.invidious.io | htmlq --text "noscript" | htmlq --attribute href .instances-list a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants