-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session related suggestions #2
Comments
Any thoughts? |
Hey again. Thanks for going through my suggestions.
This works fine and profiles are created in the new folder. I am just an absolute newbie to docker, so I still have to sort out how to mount a persistent volume. Also, if for ex. running on some free hoster space, most of the time volume mounting is not allowed, so not usable there.
which works but caused some timeouts. Maybe its better to call sessions.create explicitly before request.get. It worked though in your original version, so I have to test that a bit more. Maybe a forceCreate param would be better? Cheers, Jx- |
One more thing: I would like to add a headers parameter, so that we can pass these to https://github.com/puppeteer/puppeteer/blob/v3.3.0/docs/api.md#pagesetextrahttpheadersheaders . Some pages need correct headers to return proper responses. |
In regards to explicit session creation, I think I'm going to keep it that way since it wouldn't work any other way with the enhancements I've added via #10. Next on my list of things to do is a variable userDataDir parent directory. |
Yeah I think this is great like it is now. |
In regards to saving sessions, I've been playing around with setting a custom parent to the I used this to test it: const P = require('puppeteer')
const showCookies = async page => {
const pre = await page.$('pre')
const cookies = await pre.evaluate(e => e.innerText)
console.log(JSON.parse(cookies))
}
const setCookies = async path => {
const browser = await P.launch({
headless: true,
userDataDir: path
})
const page = await browser.newPage()
await page.goto('https://httpbin.org/cookies/set/test123/123')
await showCookies(page)
await browser.close()
}
const checkCookies = async path => {
const browser = await P.launch({
headless: true,
userDataDir: path
})
const page = await browser.newPage()
await page.goto('https://httpbin.org/cookies')
await showCookies(page)
await browser.close()
}
(async () => {
const path = 'sessions/test'
await setCookies(path)
await checkCookies(path)
})().catch(console.error) While I'd expect the output to be:
The output is actually:
I've done a little Googling and it doesn't seem like you can reload profiles? Maybe I'm doing something wrong? Any ideas? |
Hmmm, I haven't done any concrete tests, I only notice that the CF cookies stored in the session/profile folder SQLite DB are being used, because wile the first solving can take several retries and quite a bit of time, subsequent calls to the site opens very fast, obviously without needing to solve again. So as long as the container is running all seems to be OK with the cookies but since many container hosts use volatile storage, once they go idle, the profile folder is lost. My workaround now is to store the cookies locally and send them back to the proxy if not expired yet. This seems to work more or less, though. I found one issue with the documentation, that says mandatory fields for cookies are name and value but I had to send domain, too. I can play a bit with your code. Wasn't aware of that httpbin until now :) PS: what I see already is that using above code, for reasons I don't understand, the Default profile folder is not created at all. This contains the cookies DB. But maybe that's a Windows issue? PPS: using Windows temp dir as parent folder works then with Default and cookies being created. Problem is that the call to httpbin doesn't seem to store the cookies correctly, the DB file stays empty. If you call another page, such as github, cookies are stored correctly. Interestingly, in Fiddler you see the the requests from Puppeteer are working and cookies returned as expected:
which redirects to
PPPS: I fiddled the proxy while calling a page after the challenge is solved and it definitely sends back the cookies stored in the session. Cheers, Jx- |
Very interesting. Sorry it took forever to get back to you. Are you in the Discord server? https://discord.gg/pVT8rNy I'll continue to look into this but I've been busier lately. |
Hi mate, I see you converted to TS. Looks really great now and works nicely for me, especially using custom session profile, which seems to help avoiding trigger of Captcha after a couple of retries of the request.get. I have a few suggestions:
when using session param without custom User-Agent, a random browser profile folder is created instead of the custom named folder. Is there a reason for this? Why not create the named folder always when the session param is used? I believe that way the session and cookies would be reused even after a server or docker app restart.
I also noticed that named profile folders get deleted after server restart, I guess due to the non persistent nature of the Docker image data storage. Maybe we could add a conf param for a custom temp folder so that a persistent volume could be mounted to it?
lastly, when using request.get with a session param and the session does not exist it returns a notice to create a session first. Why not call sesssion.create like you call it if no session param is provided?
Anyway, thanks for the great app!
Jx-
The text was updated successfully, but these errors were encountered: