Skip to content

Conversation

@tomas-b
Copy link

@tomas-b tomas-b commented Aug 13, 2021

No description provided.

Comment on lines +6 to +8
page.setUserAgent(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36',
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

por que hace falta este UA? no anda con el default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

El UA default tiene la string HeadlessChrome, si lo corremos headless es necesario cambiar el Header. Más info: https://jsoverson.medium.com/how-to-bypass-access-denied-pages-with-headless-chrome-87ddd5f3413c

Comment on lines +164 to +166
const client = await page.target().createCDPSession()
await client.send('Network.clearBrowserCookies')
await client.send('Network.clearBrowserCache')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Para que se necesita esto?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

El sitio por algun motivo solo acepta la primer request y después marca el browser, hace falta borrar cookies y cache.


await page.goto(request.pageUrl)

const data = await page.evaluate(() => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

como comentario general para los proximos scrapers, hacer llamadas mas pequeñas al dom usando .$eval y .$$eval porque sino es mas dificil entender de donde salio cada dato

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lo voy a tener en cuenta, gracias

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opino lo mismo, para poder entender y mantener mejor el scraper hacer llamadas mas atomicas ayuda a la legibilidad.

Por ejemplo:

const title = await page.$evanl('.product-title-main-header', e => e.textContent?.trim() || '')

Copy link
Owner

@kerbaras kerbaras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

En general esta bien, me gustaron varias cosas y conseguiste los datos de size chart! solo un par de comentarios


await page.goto(request.pageUrl)

const data = await page.evaluate(() => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opino lo mismo, para poder entender y mantener mejor el scraper hacer llamadas mas atomicas ayuda a la legibilidad.

Por ejemplo:

const title = await page.$evanl('.product-title-main-header', e => e.textContent?.trim() || '')

await page.goto(request.pageUrl)

const data = await page.evaluate(() => {
let title = document.querySelector('.product-title-main-header')?.textContent?.trim() || ''
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer const cuando no se reasigna el valor (inmutable)

let p = new Product(`${v.idx}-${v.productid}`, data.title, v.producturl)

p.subTitle = data.subtitle
p.metadata = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no es necesario agregar metadata si no tiene nada, ya es el valor por defecto del Product

Comment on lines +164 to +166
const client = await page.target().createCDPSession()
await client.send('Network.clearBrowserCookies')
await client.send('Network.clearBrowserCache')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agregar comentario de que es lo que resuelve

Comment on lines +152 to +153
p.color = v.swatch
p.colorFamily = v.swatchColorFamily
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debería ser al revés

p.color = v.swatchColorFamily
p.colorFamily = v.swatch

p.metadata = {}
p.images = v.images
p.videos = []
p.description = v.description
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

La descripción no debería tener HTML

p.sku = v.sku
p.brand = v.brand
p.size = v.sizePrimary + (v.sizeSecondary ? ` - ${v.sizeSecondary}` : '')
p.realPrice = v.offerPrice
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case. Generalmente es mas seguro suponer que cuando no hay oferta puede venir empty el dato, por lo que se podría contemplar de la siguiente manera:

p.realPrice = v.offerPrice || v.listPrice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants