Skip to content

Commit

Permalink
allow for text child nodes when building tag index
Browse files Browse the repository at this point in the history
  • Loading branch information
dwarring committed Nov 17, 2023
1 parent 75a2f88 commit 3ce2214
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 2 deletions.
4 changes: 3 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Scripts in this Distribution

### `pdf-tag-dump.raku`

pdf-tag-dump.raku --select=<xpath-expr> --omit=tag --password=Xxxx --max-depth=n --marks --/atts --/style --debug t/pdf/tagged.pdf
pdf-tag-dump.raku --select=<xpath-expr> --omit=tag --password=Xxxx --max-depth=n --marks --artifacts --/atts --/style --debug t/pdf/tagged.pdf

Options:

Expand All @@ -54,6 +54,8 @@ Options:

* `--marks` - descend into marked content

* `--artifacts` - include artifact structure and content

* `--strict` - warn about unknown tags, etc

* `--/style` - omit stylesheet
Expand Down
5 changes: 4 additions & 1 deletion lib/PDF/Tags/Reader.rakumod
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ method read(PDF::Class:D :$pdf!, Bool :$create, |c --> PDF::Tags:D) {
constant Tags = Hash[PDF::Content::Tag];
has Tags %!canvas-tags{PDF::Content::Canvas};

sub build-tag-index(%tags, PDF::Content::Tag $tag) {
multi sub build-tag-index(%tags, PDF::Content::Tag $tag) {
with $tag.mcid {
%tags{$_} = $tag;
}
Expand All @@ -41,6 +41,8 @@ sub build-tag-index(%tags, PDF::Content::Tag $tag) {
}
}

multi sub build-tag-index(%, Str) { }

multi sub tag-text(PDF::Content::Tag:D $tag) {
with $tag.attributes<ActualText> {
PDF::COS::TextString.COERCE: $_
Expand All @@ -49,6 +51,7 @@ multi sub tag-text(PDF::Content::Tag:D $tag) {
$tag.kids.map(&tag-text).join
}
}

multi sub tag-text(Str:D $text) { $text }

method canvas-tags($canvas --> Hash) {
Expand Down

0 comments on commit 3ce2214

Please sign in to comment.