Skip to content

Companion repo for the Real-World Data Prep for LLMs talk

License

Notifications You must be signed in to change notification settings

Zipstack/realworld-pdf-handling

Repository files navigation

Real World Data Prep for LLMs

This is a companion repo for the Real-World Data Prep for LLMs talk available at: https://www.youtube.com/live/YfW5vVwgbyo

Agenda

We look at how to deal with real-world data extraction from PDFs and will cover dealing with the following:

  • Native text / clean PDFs
  • Scanned PDFs
  • Handwritten text and hand-filled forms
  • Tables in PDFs
  • Smartphone-captured images

We will compare the performance of different OCR tools and techniques for each of these scenarios.

About

Companion repo for the Real-World Data Prep for LLMs talk

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published