A command line tool for extracting structured information from argument-rich/claim-rich PDF documents
Go1.25.4+Python3.11 >=Docker28.1.1+
- Search for any topic via SearXNG
- Specify a number of files to aggregate
- Files are processed using a spaCy Span Categorizer (SpanCat) model trained on ~1500 silver labels to detect and extract claim spans
- View analysis for each document returned in JSON format
- Sources
- Who made the claim
- Claim Verbs
- The verb used to make the claim
- Claim Modifiers
- Modifier(s) that indicate the strength/degree with which the claim was made
- Claim Contents
- The claim being made
- Origin Document
- The document spans were extracted from
- Origin Sentence
- The sentence that contains a given span
- Claim Density Score
- A value representing how claim-heavy a given document is
- Confidence Score
- How confident the model was at predicting a given span
