Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge subpattern references #18

Open
wants to merge 134 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
90ebc6b
Add RECURSIVE-SUBPATTERN class.
nbtrap Dec 27, 2013
bf19675
Add lexer tokens for (?R) and (?&NAME) sequences.
nbtrap Dec 27, 2013
9bf3bf0
Use CASE option instead of IF expression.
nbtrap Dec 27, 2013
35c05dd
Optionally make PARSE-REGISTER-NAME-AUX parse from within (?&name) co…
nbtrap Dec 27, 2013
5adc50c
Make (?<digit>+) and (?&NAME) constructs <legal-token>s.
nbtrap Dec 27, 2013
ca8f723
Parse subpattern references correctly.
nbtrap Dec 27, 2013
da2bba6
Make note to support (?0) and (?R).
nbtrap Dec 27, 2013
6a493f8
Rename RECURSIVE-SUBPATTERN -> SUBPATTERN-REFERENCE.
nbtrap Dec 27, 2013
95c400a
Beginnings of code to check for invalid subpattern references.
nbtrap Dec 27, 2013
b4697b6
Add to doc comment of CONVERT-AUX.
nbtrap Dec 27, 2013
ceecaf8
Add shell of method for handling subpattern reference parse-tree nodes.
nbtrap Dec 27, 2013
561228d
Give flesh to CONVERT-COMPOUND-PARSE-TREE.
nbtrap Dec 27, 2013
82fdd76
Rename variable NAMED-SUBPATTERN-REFS-SEEN -> NAMED-SUBPATTERN-REFS.
nbtrap Dec 27, 2013
514cb45
Convert named subpattern refs to numbered subpattern refs inside CONV…
nbtrap Dec 28, 2013
a2a9632
Return a fifth value from CONVERT, namely, the list of numbers of sub…
nbtrap Dec 28, 2013
8910783
Create binding of special variable REFERENCED-REGISTER-MATCHERS-PLIST.
nbtrap Dec 28, 2013
eb3d9ec
Define the closure that matches subpattern references, and modify the…
nbtrap Dec 29, 2013
ce1b0fe
Define REGEX-LENGTH on SUBPATTERN-REFERENCE.
nbtrap Dec 29, 2013
972dba5
Define COPY-REGEX and COMPUTE-OFFSETS on SUBPATTERN-REFERENCE.
nbtrap Dec 29, 2013
b7ab328
Be sure not to touch register offsets when matching a register indire…
nbtrap Dec 29, 2013
69f0d7c
Don't backtrack through subpattern references ad infinitum.
nbtrap Dec 29, 2013
55a48e0
Revert "Don't backtrack through subpattern references ad infinitum."
nbtrap Dec 29, 2013
38986bd
Use COND instead of multi-branch IF.
nbtrap Jan 2, 2014
d79af78
Merge branch 'master' into subpattern-references
nbtrap Jan 6, 2014
5de7565
Add tests for subpattern references.
nbtrap Jan 6, 2014
fa60aff
Remove unused variable declaration.
nbtrap Jan 6, 2014
09625d5
Skip end of string optimizations with subpattern references.
nbtrap Jan 6, 2014
a4e1eaa
Actually, only skip the "skip" optimization when optimizing ends of s…
nbtrap Jan 6, 2014
e100f55
Add forgotten type to ETYPECASE expression.
nbtrap Jan 6, 2014
ada178a
Add more tests for subpattern references.
nbtrap Jan 6, 2014
e7d4eff
Use SETF instead of SETQ.
nbtrap Jan 6, 2014
0309c76
Make sure INSIDE-SUBPATTERN-REFERENCE gets set to NIL when NEXT-FN is…
nbtrap Jan 7, 2014
806857a
Number tests correctly.
nbtrap Jan 7, 2014
047c17e
Create a temporary set of registers for each pass through a subpatter…
nbtrap Jan 7, 2014
22c26a8
Detect whether perl tess regexes contain named registers.
nbtrap Jan 7, 2014
3bbe139
Add some additional (trivial) cases to ETYPECASE in CONVERT-NAMED-SUB…
nbtrap Jan 7, 2014
b05a808
Add some tests for named subpattern references.
nbtrap Jan 7, 2014
c70985f
Get rid of unused INSIDE-SUBPATTERN-REFERENCE variable.
nbtrap Jan 8, 2014
cdb513c
Only compute INNER-MATCHER-WITHOUT-NEXT-FN when it's needed.
nbtrap Jan 8, 2014
4fe74d0
Remove FIXME comment from closures.lisp, and rename one of the variab…
nbtrap Jan 12, 2014
56243a1
Remove FIXME comment from CREATE-MATCHER-AUX for SUBPATTERN-REFERENCE.
nbtrap Jan 12, 2014
8591bd4
Remove more FIXME comments.
nbtrap Jan 12, 2014
8572c7a
Make named subpattern references refer to the first subpattern with t…
nbtrap Jan 12, 2014
4d1c609
Don't declare type of variable that may be either FUNCTION or NIL.
nbtrap Jan 12, 2014
e4abae6
Add two tests (1652 and 1675) for testing forward subpattern referenc…
nbtrap Jan 12, 2014
36acbdd
Reorder the subpattern reference tests.
nbtrap Jan 12, 2014
87c9afc
Don't use START-OF-END-STRING-P optimization when subpattern referenc…
nbtrap Jan 12, 2014
6620354
Add tests for patterns containing illegal whitespace in subpattern re…
nbtrap Jan 12, 2014
09ad0d4
Bind *ALLOW-NAMED-REGISTERS* to NIL before running simple tests.
nbtrap Jan 12, 2014
68d4215
Add two more tests for dealing with START-OF-END-STRING-P and subpatt…
nbtrap Jan 12, 2014
ffca226
Add two more tests.
nbtrap Jan 12, 2014
e9c94db
Remove FIXME comment about disambiguating named subpattern references.
nbtrap Jan 13, 2014
d7d8941
Change FIXME comment in COMPUTE-OFFSETS method on SUBPATTERN-REFERENCE.
nbtrap Jan 14, 2014
c5e06f7
Remove another FIXME comment.
nbtrap Jan 14, 2014
e04ccda
Remove the FIXME comment from the REGEX-LENGTH method for SUBPATTERN-…
nbtrap Jan 14, 2014
1f1bba8
Remove unused sub from perltest.pl eval.
nbtrap Jan 14, 2014
0b70e23
Make perltest.pl handle arbitrarily large and variable numbers of reg…
nbtrap Jan 15, 2014
7bd7c22
Add two tests for double- and triple-digit register number subpattern…
nbtrap Jan 15, 2014
f4f8137
Remove comment about possibly supporting (?0) and (?R).
nbtrap Jan 15, 2014
a37bf58
Remove unneeded variable NAMED-REG-SEEN.
nbtrap Jan 15, 2014
8678c2b
Add some failing tests taken from the pcre distribution.
nbtrap Jan 16, 2014
41cbc80
Add HAS-SUBPATTERN-REF-P function to convert.lisp.
nbtrap Jan 16, 2014
8cdc21b
Don't needlessly stop accumulating for string-beginning optimization.
nbtrap Jan 16, 2014
7c501f0
Remove specific test references from comment.
nbtrap Jan 16, 2014
e4dfb25
Fix indentation.
nbtrap Jan 16, 2014
49878b0
Don't create a separate matcher for matching registers from subpatter…
nbtrap Jan 16, 2014
da8f474
Use IF instead of COND for simple condition.
nbtrap Jan 17, 2014
2805231
Add some more tests that check for correct backtracking through subpa…
nbtrap Jan 17, 2014
5c0f673
Backtrack correctly into subpattern references.
nbtrap Jan 19, 2014
987c5b4
Clean up CREATE-MATCHER-AUX method for REGISTER.
nbtrap Jan 19, 2014
bab5e70
Clarify and remove some comments.
nbtrap Jan 19, 2014
e32cf51
Add some more subpattern reference tests.
nbtrap Jan 20, 2014
6f0ad13
Add more tests for subpattern references.
nbtrap Jan 24, 2014
1b51a27
Add some more tests for subpattern references.
nbtrap Jan 25, 2014
82414b7
Remove FIXME comment from CONVERT-COMPOUND-PARSE-TREE method on :SUBP…
nbtrap Jan 25, 2014
da706f6
Rename REFERENCED-REGISTER-MATCHERS -> REGISTER-MATCHERS.
nbtrap Jan 25, 2014
5f7af85
Reformat comments in the style of other comments in the package.
nbtrap Jan 25, 2014
6d1bee4
Reformat comments added in changes adding subpattern references to 70…
nbtrap Jan 25, 2014
2852e76
Add FILTER and WORD-BOUNDARY to the default ETYPECASE clause in CONVE…
nbtrap Feb 8, 2014
7d1ed2b
Convert ETYPECASE -> TYPECASE, since all possibilities are accounted …
nbtrap Feb 8, 2014
eed3e27
Add a test for handling back-references within subpattern references …
nbtrap Feb 8, 2014
10a6af6
Add some more tests verifying correct behavior of subpattern- and bac…
nbtrap Feb 16, 2014
c3c5e06
Begin transitioning to the new register offsets storage model.
nbtrap Feb 16, 2014
3891ee5
Continue transitioning to the new register offsets storage model.
nbtrap Feb 16, 2014
2b40341
Add CONTAINING-REGISTERS slot to REGISTER class.
nbtrap Feb 16, 2014
6c6771a
Compute CONTAINING-REGISTERS slot of REGISTER instances.
nbtrap Feb 16, 2014
474b91a
Continue transition to new register offsets storage model.
nbtrap Feb 16, 2014
d938c89
Add some test cases that illumine one of the current register offsets…
nbtrap Feb 16, 2014
ffeff74
Don't store possible register offset of register entered via subpatte…
nbtrap Feb 16, 2014
c3a7c2a
Tidy up the comments in CREATE-MATCHER-AUX method specialized on REGI…
nbtrap Feb 16, 2014
1627892
Rename CONTAINING-REGISTERS -> SUBREGISTERS.
nbtrap Feb 16, 2014
9d63ac7
Fix test 1798.
nbtrap Feb 16, 2014
d517e82
Remove some redundant code in CREATE-MATCHER-AUX specialized on REGIS…
nbtrap Feb 17, 2014
bfe5c92
Inline POP-OFFSETS and PUSH-OFFSETS.
nbtrap Feb 18, 2014
eb26c38
Use LOOP instead of MAPCAR/MAPC for pushing/popping register offsets.
nbtrap Feb 18, 2014
1da422d
Disable some tests in test/perltestdata, but add them to test/simple.
nbtrap Feb 18, 2014
0b1b87e
Record SUBREGISTER-COUNT instead of a list of SUBREGISTERS.
nbtrap Feb 19, 2014
d3a1dee
Merge branch 'nested-refs' into subpattern-references
nbtrap Feb 22, 2014
8db37a9
Add two more tests that currently fail.
nbtrap Feb 21, 2014
f08f4e2
Add two more tests that fail.
nbtrap Feb 21, 2014
f064a0c
Document subpattern references in the html documentation.
nbtrap Feb 21, 2014
0de65d3
Add subpattern reference commentary to docs on *ALLOW-NAMED-REGISTERS*
nbtrap Feb 21, 2014
adb9156
Restore the original docstrings to *REG-STARTS*, etc.
nbtrap Feb 21, 2014
07c8d92
Add three new special variables for holding the register offsets stacks.
nbtrap Feb 21, 2014
b1d2ec7
Bind new special variables when scanning.
nbtrap Feb 21, 2014
ad4d39f
Make *REG-STARTS*, etc., have NIL as initial value instead of (list n…
nbtrap Feb 21, 2014
00c2ca5
Access offset values directly in *REG-STARTS*, etc., instead of thoru…
nbtrap Feb 22, 2014
e64e5ba
Finish implementing the newest register offsets storage model.
nbtrap Feb 22, 2014
9aaa4da
Declare type once instead of using THE repeatedly.
nbtrap Feb 22, 2014
92b459e
Merge branch 'nested-refs2' into subpattern-references
nbtrap Feb 22, 2014
f77686c
Add more tests to *TESTS-TO-SKIP*
nbtrap Feb 22, 2014
f50d7a6
Move more tests (1809-1812) into test/simple.
nbtrap Feb 22, 2014
53cdefe
Add url reference to comment about Perl mishandling certain construct…
nbtrap Feb 22, 2014
2f4c042
Add FIXME comment to come back to later.
nbtrap Feb 22, 2014
8f838eb
Add more tests for subpattern-/back-reference cooperation.
nbtrap Feb 24, 2014
2fc37a6
Move more tests from test/perltestdata into test/simple.
nbtrap Feb 24, 2014
9b97c91
Create new bindings for the referenced register upon entry to subpatt…
nbtrap Feb 26, 2014
46a078c
Remove FIXME comment from convert.lisp.
nbtrap Feb 27, 2014
9b0a9c0
Use "recurse" instead of "refer" to describe the action associated wi…
nbtrap Feb 27, 2014
81be1e4
Convert calls to PUSH-OFFSETS and POP-OFFSETS to fewer calls to more …
nbtrap Feb 28, 2014
a941400
Update comment.
nbtrap Feb 28, 2014
846c22e
Clarify comments in CREATE-MATCHER-AUX method specialized over regist…
nbtrap Feb 28, 2014
b5b406c
Get rid of useless declaration.
nbtrap Feb 28, 2014
72d020e
Wrap docstrings to 70 columns.
nbtrap Mar 1, 2014
498e3f0
Fix lexical/special binding bug.
nbtrap Mar 1, 2014
544fcd1
Merge tag 'v2.0.7' into subpattern-references
nbtrap Mar 1, 2014
22082fc
Fix indentation of PROG1.
nbtrap Mar 1, 2014
f0bc9f3
Get rid of extra LET.
nbtrap Mar 1, 2014
2ce2ff7
Use COND instead of IF and PROGN.
nbtrap Mar 1, 2014
dc6eaa4
Rename OTHER-FN -> CONT.
nbtrap Mar 1, 2014
7d5c4d4
Replace another IF/PROGN with COND.
nbtrap Mar 1, 2014
a833ffb
Use LOCALLY instead of LET with no bindings.
nbtrap Mar 1, 2014
bef1a6a
Rename SUBREGISTER-COUNT -> INNER-REGISTER-COUNT.
nbtrap Mar 1, 2014
8a288eb
Fix declaration on SUBPATTERN-REFS.
nbtrap Mar 2, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions api.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,9 @@ modify its first argument \(but only if it's a parse tree)."))
(when flags
(setq parse-tree (list :group (cons :flags flags) parse-tree))))
(let ((*syntax-error-string* nil))
(multiple-value-bind (regex reg-num starts-with reg-names)
(multiple-value-bind (regex reg-num starts-with reg-names subpattern-refs)
(convert parse-tree)
(declare (special subpattern-refs))
;; simplify REGEX by flattening nested SEQ and ALTERNATION
;; constructs and gathering STR objects
(let ((regex (gather-strings (flatten regex))))
Expand Down Expand Up @@ -132,6 +133,9 @@ modify its first argument \(but only if it's a parse tree)."))
;; initialize the counters for CREATE-MATCHER-AUX
(*rep-num* 0)
(*zero-length-num* 0)
;; keep track of the matcher functions of registers
;; referenced by subpattern references
(register-matchers (cons nil nil))
;; create the actual matcher function (which does all the
;; work of matching the regular expression) corresponding
;; to REGEX and at the same time set the special
Expand All @@ -148,7 +152,10 @@ modify its first argument \(but only if it's a parse tree)."))
(create-bmh-matcher
(str starts-with)
(case-insensitive-p starts-with))))))
(declare (special end-string-offset end-anchored-p end-string))
(declare (special end-string-offset
end-anchored-p
end-string
register-matchers))
;; now create the scanner and return it
(values (create-scanner-aux match-fn
(regex-min-length regex)
Expand Down
142 changes: 111 additions & 31 deletions closures.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -86,42 +86,108 @@ such that the call to NEXT-FN after the match would succeed."))

(defmethod create-matcher-aux ((register register) next-fn)
(declare #.*standard-optimize-settings*)
;; the position of this REGISTER within the whole regex; we start to
;; count at 0
(let ((num (num register)))
(declare (fixnum num))
;; STORE-END-OF-REG is a thin wrapper around NEXT-FN which will
;; update the corresponding values of *REGS-START* and *REGS-END*
;; after the inner matcher has succeeded
(flet ((store-end-of-reg (start-pos)
(declare (fixnum start-pos)
(function next-fn))
(setf (svref *reg-starts* num) (svref *regs-maybe-start* num)
(svref *reg-ends* num) start-pos)
(funcall next-fn start-pos)))
(declare (special register-matchers))
(let ((num (num register))
(inner-register-count (inner-register-count register))
;; a place to store the next function to call when we arrive
;; here via a subpattern reference
subpattern-ref-continuations)
(declare (fixnum num inner-register-count)
(list subpattern-ref-continuations))
(labels
((push-registers-state (new-starts new-maybe-starts new-ends)
(declare (list new-starts new-maybe-starts new-ends))
;; only push the register states for this register and registers
;; local to it
(loop for idx from num upto (+ num inner-register-count) do
(locally (declare (fixnum idx))
(push (svref *reg-ends* idx) (svref *reg-ends-stacks* idx))
(setf (svref *reg-ends* idx) (pop new-ends))
(push (svref *regs-maybe-start* idx) (svref *regs-maybe-start-stacks* idx))
(setf (svref *regs-maybe-start* idx) (pop new-maybe-starts))
(push (svref *reg-starts* idx) (svref *reg-starts-stacks* idx))
(setf (svref *reg-starts* idx) (pop new-starts)))))
(pop-registers-state ()
;; return the state that was destroyed by this restore
(let (old-starts old-maybe-starts old-ends)
(declare (list old-starts old-maybe-starts old-ends))
(loop for idx from (+ num inner-register-count) downto num do
(locally (declare (fixnum idx))
(push (svref *reg-ends* idx) old-ends)
(setf (svref *reg-ends* idx) (pop (svref *reg-ends-stacks* idx)))
(push (svref *regs-maybe-start* idx) old-maybe-starts)
(setf (svref *regs-maybe-start* idx) (pop (svref *regs-maybe-start-stacks* idx)))
(push (svref *reg-starts* idx) old-starts)
(setf (svref *reg-starts* idx) (pop (svref *reg-starts-stacks* idx)))))
(values old-starts old-maybe-starts old-ends)))
;; STORE-END-OF-REG is a thin wrapper around NEXT-FN which
;; will update register offsets after the inner matcher has
;; succeded
(store-end-of-reg (start-pos)
(declare (fixnum start-pos)
(function next-fn))
(cond
(subpattern-ref-continuations
;; we're returning from a register that was entered
;; through a subpattern reference; restore the registers
;; state as it was upon entering the subpattern
;; reference, but save the intermediary state for when
;; we have to backtrack or unwind
(multiple-value-bind (saved-starts saved-maybe-starts saved-ends)
(pop-registers-state)
(let ((next-fn (pop subpattern-ref-continuations)))
(prog1 (funcall (the function next-fn) start-pos)
;; un-restore the registers state so we
;; backtrack/unwind cleanly
(push-registers-state saved-starts saved-maybe-starts saved-ends)
(push next-fn subpattern-ref-continuations)))))
(t
;; we're returning from a register that was entered
;; directly save the start and end positions, and match
;; the rest of the pattern
(setf (svref *reg-starts* num) (svref *regs-maybe-start* num)
(svref *reg-ends* num) start-pos)
(funcall next-fn start-pos)))))
(declare (inline push-registers-state pop-registers-state))
;; the inner matcher is a closure corresponding to the regex
;; wrapped by this REGISTER
(let ((inner-matcher (create-matcher-aux (regex register)
#'store-end-of-reg)))
(declare (function inner-matcher))
;; here comes the actual closure for REGISTER
(lambda (start-pos)
(declare (fixnum start-pos))
;; remember the old values of *REGS-START* and friends in
;; case we cannot match
(let ((old-*reg-starts* (svref *reg-starts* num))
(old-*regs-maybe-start* (svref *regs-maybe-start* num))
(old-*reg-ends* (svref *reg-ends* num)))
;; we cannot use *REGS-START* here because Perl allows
;; regular expressions like /(a|\1x)*/
(setf (svref *regs-maybe-start* num) start-pos)
(let ((next-pos (funcall inner-matcher start-pos)))
(unless next-pos
;; restore old values on failure
(setf (svref *reg-starts* num) old-*reg-starts*
(svref *regs-maybe-start* num) old-*regs-maybe-start*
(svref *reg-ends* num) old-*reg-ends*))
next-pos)))))))
;; here comes the actual closure for REGISTER; save it in a
;; special variable so it can be called by subpattern
;; references
(setf (getf (car register-matchers) num)
(lambda (start-pos &optional cont)
(declare (fixnum start-pos))
(cond
(cont
;; the presence of CONT indicates that this
;; register has been entered via a subpattern
;; reference closure; save the registers state,
;; creating fresh new "bindings" for the local
;; register offsets; restore the state before
;; returning to the caller
(push cont subpattern-ref-continuations)
(push-registers-state nil nil nil)
(prog1
(funcall inner-matcher start-pos)
(pop-registers-state)
(pop subpattern-ref-continuations)))
(t
(let ((old-*reg-starts* (svref *reg-starts* num))
(old-*regs-maybe-start* (svref *regs-maybe-start* num))
(old-*reg-ends* (svref *reg-ends* num)))
;; we cannot use *REG-STARTS* here because Perl
;; allows regular expressions like /(a|\1x)*/
(setf (svref *regs-maybe-start* num) start-pos)
(let ((next-pos (funcall inner-matcher start-pos)))
(unless next-pos
;; restore old values on failure
(setf (svref *reg-starts* num) old-*reg-starts*
(svref *regs-maybe-start* num) old-*regs-maybe-start*
(svref *reg-ends* num) old-*reg-ends*))
next-pos))))))))))

(defmethod create-matcher-aux ((lookahead lookahead) next-fn)
(declare #.*standard-optimize-settings*)
Expand Down Expand Up @@ -427,6 +493,20 @@ against CHR-EXPR."
reg-start reg-end)
(funcall next-fn next-pos)))))))))

(defmethod create-matcher-aux ((subpattern-reference subpattern-reference) next-fn)
(declare #.*standard-optimize-settings*)
(declare (special register-matchers)
(function next-fn))
;; close over the special variable REGISTER-MATCHERS in order to
;; reference it during the match phase
(let ((num (num subpattern-reference))
(register-matchers register-matchers))
(declare (fixnum num) (function next-fn))
(lambda (start-pos)
(funcall (the function (getf (car register-matchers) (1- num)))
start-pos
next-fn))))

(defmethod create-matcher-aux ((branch branch) next-fn)
(declare #.*standard-optimize-settings*)
(let* ((test (test branch))
Expand Down
Loading