Incorrect Content Extraction – Mercury Parser #115

agmm · 2020-02-18T21:05:40Z

Mercury Parser está teniendo dificultad al extraer el contenido de algunos artículos. En vez de extraer el contenido principal termina extrayendo otras partes de la página que no son relevantes.

rnegron · 2020-02-19T02:25:10Z

Lo noté también con algunos artículos recientemente. ¿Hay algo que podríamos hacer mejor respecto al content extraction de Mercury?

agmm · 2020-02-19T14:39:34Z

Creo que lo mejor sería añadiendo custom extractors para cada site.

Like this:

const customExtractor = {
  domain: 'www.noticias.pr',
  title: {
    selectors: ['h1', '.ArticlePage-headline']
  },
  author: {
    selectors: ['.ArticlePage-authorInfo-bio-name']
  },
  content: {
    selectors: ['article']
  }
};

Mercury.addExtractor(customExtractor);

rnegron added the question Further information is requested label Feb 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Content Extraction – Mercury Parser #115

Incorrect Content Extraction – Mercury Parser #115

agmm commented Feb 18, 2020

rnegron commented Feb 19, 2020

agmm commented Feb 19, 2020 •

edited

Loading

Incorrect Content Extraction – Mercury Parser #115

Incorrect Content Extraction – Mercury Parser #115

Comments

agmm commented Feb 18, 2020

rnegron commented Feb 19, 2020

agmm commented Feb 19, 2020 • edited Loading

agmm commented Feb 19, 2020 •

edited

Loading