Seeking advice on creating a PDF to text extraction pipeline component #7549
SamEdwardes
started this conversation in
Help: Best practices
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been playing around with the idea of making a pipeline component that can support extracting text from a PDF. There are a few reasons I would like to do this:
For reference, here is some pseudo code describing how I think you could use this:
I have a few questions as a starting point:
Doc
and return aDoc
.nlp
as parameters?Beta Was this translation helpful? Give feedback.
All reactions