13 Commits

Author SHA1 Message Date
Justin Viiret
997c0c9efd ComponentRepeat: wire R{0,N} as (R{1,N})?
Change the way that we wire up the edges in a bounded repeat to avoid
large fan-out from predecessors.
2016-03-01 11:22:45 +11:00
Justin Viiret
205bc1af7f PCRE includes U+180E in /[:print:]/8W 2015-12-07 09:10:12 +11:00
Justin Viiret
f9b7e806b1 Update defn of class [:punct:] for PCRE 8.38 2015-12-07 09:08:46 +11:00
Justin Viiret
25a01e1c3c Unify handling of caseless flag in class parser
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).

Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
2015-12-07 09:07:37 +11:00
Justin Viiret
bdb7a10034 Fix defn of POSIX graph, print, punct classes
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
2015-12-07 09:06:23 +11:00
Justin Viiret
fd19168025 Restore \Q..\E support in character classes 2015-11-18 15:27:05 +11:00
Justin Viiret
9cffa7666f Refine ComponentClass::class_empty
ComponentClass::class_empty should only be used on finalized classes to
determine whether a given class contains any elements; it should not
take the cr_ucp or cps_ucp into account, as they have been folden in by
the finalize call.

Fixes our failure to identify that the pattern /[^\D\d]/8W can never
match.
2015-11-10 14:36:39 +11:00
Justin Viiret
c68bfe05d8 Don't use class_empty in early class parsing
Instead, explicitly track whether we're still in the early class parsing
machine.
2015-11-10 14:36:39 +11:00
Justin Viiret
b1f6a539c7 Remove dead ComponentClass::{get,set}FirstChar 2015-11-10 14:36:39 +11:00
Justin Viiret
9a7b912a5d Rework parser rejection for POSIX collating elems
Implement rejection of POSIX collating elements ("[.ch.]" and "[=ch=]"
entirely in the Ragel parser, using the same approach both inside and
ouside character classes.

Fix buggy rejection of [^.ch.], which we should accept as a character
class.
2015-11-10 14:36:39 +11:00
Justin Viiret
4c53bd4641 parser: use 'override' keyword in subclasses 2015-11-10 14:36:38 +11:00
Justin Viiret
1afc591c30 Check for (and throw on) large min repeat
We were only checking for large maximum bounds, which meant that we
would attempt to compile A{N,} where N is huge.
2015-10-30 11:28:37 +11:00
Matthew Barr
904e436f11 Initial commit of Hyperscan 2015-10-20 09:13:35 +11:00