You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to parse pdf document with table.
I have got text and its coordinates with getDataTm(). I have expected to define limits of x coordinate where the columns should be and that it will solve all my problems.
Unfortunately, I got some confusing values for coordinates, I have tried to find out what is happening but without success.
I have noted two anomalies. The first is the values for row numbers in the first column:
50 331 1
50 298 2
796 42 3
Visually the numbers are one above the other. Also I have to mention that the page is landscape and $details['MediaBox'] are 842.25 and 595.5 . I have noticed that 796+50 ~ 842 and that approximate row high is ~35 for all other cells, so is it possible that the reference point has been changed to the right bottom of the table?
Second mystery is the x coordinate of the last column for I got values:
The problem is that those x values are in the middle of the table. There are columns with greater x value that are left from the mentioned column.
My question is: Is there some math that I have missed, and is it possible that the coordinates do not use the same reference system for the whole document?
The text was updated successfully, but these errors were encountered:
I have created scatter plot of coordinates and everything looks as it should be:
All coordinates are read well, it looks like that every text is read correctly, but they are not paired correctly - some coordinates and texts are mixed.
Definitely, it is a bug.
After investigation: The second read text is empty and after that every read text is shifted to coordinates for next text. The last text (page number) is not shown but its coordinates are used.
I am not sure is it valid note that the problems starts with processing text in logo area.
I want to parse pdf document with table.
I have got text and its coordinates with getDataTm(). I have expected to define limits of x coordinate where the columns should be and that it will solve all my problems.
Unfortunately, I got some confusing values for coordinates, I have tried to find out what is happening but without success.
I have noted two anomalies. The first is the values for row numbers in the first column:
Visually the numbers are one above the other. Also I have to mention that the page is landscape and $details['MediaBox'] are 842.25 and 595.5 . I have noticed that 796+50 ~ 842 and that approximate row high is ~35 for all other cells, so is it possible that the reference point has been changed to the right bottom of the table?
Second mystery is the x coordinate of the last column for I got values:
The problem is that those x values are in the middle of the table. There are columns with greater x value that are left from the mentioned column.
My question is: Is there some math that I have missed, and is it possible that the coordinates do not use the same reference system for the whole document?
The text was updated successfully, but these errors were encountered: