[BUG] - Stream: Area detection hangs on PDF page #30

kirk-marple · 2024-01-05T01:58:31Z

Describe the bug
When attempting to extract tables from this 250+ page PDF, I found that it hangs on a specific page (98), in the 'Detect' method.

To Reproduce
Using 40927R03.pdf

I've tried with 0.1.3 and 0.1.4-alpha001, and got hang in same spot.

Using .NET 6.0, C#.

using var pdoc = PdfDocument.Open(content.Stream, new ParsingOptions { SkipMissingFonts = true, UseLenientParsing = true });
var da = new Tabula.Detectors.SimpleNurminenDetectionAlgorithm();

var area = Tabula.ObjectExtractor.ExtractPage(pdoc, 98 /* hangs on this page */);
var regions = da.Detect(area); <-- this line hangs

Expected behavior
To properly parse all tables.

The text was updated successfully, but these errors were encountered:

andyesys · 2024-03-15T08:43:49Z

I have also encountered this hang and had to stop using this library unfortunately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Stream: Area detection hangs on PDF page #30

[BUG] - Stream: Area detection hangs on PDF page #30

kirk-marple commented Jan 5, 2024

andyesys commented Mar 15, 2024

[BUG] - Stream: Area detection hangs on PDF page #30

[BUG] - Stream: Area detection hangs on PDF page #30

Comments

kirk-marple commented Jan 5, 2024

andyesys commented Mar 15, 2024