Parsing table columns #752

mpele · 2024-12-21T23:08:30Z

I want to parse pdf document with table.
I have got text and its coordinates with getDataTm(). I have expected to define limits of x coordinate where the columns should be and that it will solve all my problems.

Unfortunately, I got some confusing values for coordinates, I have tried to find out what is happening but without success.

I have noted two anomalies. The first is the values for row numbers in the first column:

50 331 1
50 298 2
796 42 3

Visually the numbers are one above the other. Also I have to mention that the page is landscape and $details['MediaBox'] are 842.25 and 595.5 . I have noticed that 796+50 ~ 842 and that approximate row high is ~35 for all other cells, so is it possible that the reference point has been changed to the right bottom of the table?

Second mystery is the x coordinate of the last column for I got values:

396 367 16.12.2024
396 333 16.12.2024
396 299 16.12.2024

The problem is that those x values are in the middle of the table. There are columns with greater x value that are left from the mentioned column.

My question is: Is there some math that I have missed, and is it possible that the coordinates do not use the same reference system for the whole document?

The text was updated successfully, but these errors were encountered:

mpele · 2024-12-22T00:10:27Z

I have created scatter plot of coordinates and everything looks as it should be:

All coordinates are read well, it looks like that every text is read correctly, but they are not paired correctly - some coordinates and texts are mixed.

Definitely, it is a bug.

After investigation: The second read text is empty and after that every read text is shifted to coordinates for next text. The last text (page number) is not shown but its coordinates are used.
I am not sure is it valid note that the problems starts with processing text in logo area.

k00ni added the bug label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing table columns #752

Parsing table columns #752

mpele commented Dec 21, 2024

mpele commented Dec 22, 2024

Parsing table columns #752

Parsing table columns #752

Comments

mpele commented Dec 21, 2024

mpele commented Dec 22, 2024