17 Commits

Author SHA1 Message Date
Chang, Harry
1f4c10a58d Logical combination: support EOD match from purely negative case. 2019-08-13 14:50:07 +08:00
Chang, Harry
8a1c497f44 Logical Combination of patterns. 2018-06-27 14:04:57 +08:00
Matthew Barr
1891f14755 Add support for Hamming distance approx matching 2018-01-19 06:11:43 -05:00
Alex Coyte
d317d75615 character classes: handle \Q\E and utf8 2017-06-21 08:43:44 +10:00
Justin Viiret
5edecbf539 ng: check can_never_match before validate_fuzzy 2017-04-26 15:16:03 +10:00
Anatoly Burakov
9f72dede5c Add support for approximate matching in NFA matcher unit tests 2017-04-26 15:11:54 +10:00
Justin Viiret
1245156f44 parser: handle "control verbs" without close paren 2017-04-26 14:59:02 +10:00
Justin Viiret
bef6889844 parser: use control_verb parser inline 2017-04-26 14:58:43 +10:00
Justin Viiret
1875d55cf1 parser: add initial parser for control verbs
This more reliably handles control verbs like (*UTF8) that can only
happen at the start of the pattern, and allows them in any ordering.
2017-04-26 14:57:46 +10:00
Alex Coyte
fbaa0a1b25 make expected too large patterns even larger 2017-04-26 14:44:49 +10:00
Justin Viiret
67e450115a parser: ignore \E that is not preceded by \Q
This conforms to PCRE's behaviour, where an isolated \E that is not
preceded by \Q is ignored.
2016-08-10 15:08:01 +10:00
Justin Viiret
98eff64edf ng_prefilter: turn large max bound into inf
During prefilter region replacement, turn regions with very large max
bounds into repeats with inf max bound. This improves compile time and
the likelihood that we will actually be able to build an implementation
for such patterns.
2016-03-01 11:22:45 +11:00
Justin Viiret
fd19168025 Restore \Q..\E support in character classes 2015-11-18 15:27:05 +11:00
Justin Viiret
9cffa7666f Refine ComponentClass::class_empty
ComponentClass::class_empty should only be used on finalized classes to
determine whether a given class contains any elements; it should not
take the cr_ucp or cps_ucp into account, as they have been folden in by
the finalize call.

Fixes our failure to identify that the pattern /[^\D\d]/8W can never
match.
2015-11-10 14:36:39 +11:00
Justin Viiret
9a7b912a5d Rework parser rejection for POSIX collating elems
Implement rejection of POSIX collating elements ("[.ch.]" and "[=ch=]"
entirely in the Ragel parser, using the same approach both inside and
ouside character classes.

Fix buggy rejection of [^.ch.], which we should accept as a character
class.
2015-11-10 14:36:39 +11:00
Justin Viiret
1afc591c30 Check for (and throw on) large min repeat
We were only checking for large maximum bounds, which meant that we
would attempt to compile A{N,} where N is huge.
2015-10-30 11:28:37 +11:00
Matthew Barr
904e436f11 Initial commit of Hyperscan 2015-10-20 09:13:35 +11:00