Konstantinos Margaritis
e35b88f2c8
use STL make_unique, remove wrapper header, breaks C++17 compilation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
556206f138
replace push_back by emplace_back where possible
2021-10-12 11:51:33 +03:00
Chang, Harry
001b7824d2
Logical Combination: use hs_misc_free instead of free.
...
fixes github issue #284
2021-01-25 14:13:13 +02:00
Hong, Yang A
23e5f06594
add new Literal API for pure literal expressions:
...
Design compile time api hs_compile_lit() and hs_compile_lit_multi()
to handle pure literal pattern sets. Corresponding option --literal-on
is added for hyperscan testing suites. Extended parameters and part of
flags are not supported for this api.
2019-08-13 14:51:38 +08:00
Chang, Harry
1f4c10a58d
Logical combination: support EOD match from purely negative case.
2019-08-13 14:50:07 +08:00
Hong, Yang A
f68723a606
literal matching: separate path for pure literal patterns
2019-01-21 09:59:22 +08:00
Chang, Harry
8a1c497f44
Logical Combination of patterns.
2018-06-27 14:04:57 +08:00
Wang, Xiang W
08b00f6149
hscollider: fix input length for UTF8 check
2018-06-27 14:04:53 +08:00
Matthew Barr
1891f14755
Add support for Hamming distance approx matching
2018-01-19 06:11:43 -05:00
Alex Coyte
d9e2c3daca
make ComponentRepeat::vacuous_everywhere() more accurate
2017-08-21 11:18:54 +10:00
Justin Viiret
33823d60d1
tidy: "ue2::flat_set/map" -> "flat_set/map"
2017-08-21 11:14:59 +10:00
Justin Viiret
9cf66b6ac9
util: switch from Boost to std::unordered set/map
...
This commit replaces the ue2::unordered_{set,map} types with their STL
versions, with some new hashing utilities in util/hash.h. The new types
ue2_unordered_set<T> and ue2_unordered_map<Key, T> default to using the
ue2_hasher.
The header util/ue2_containers.h has been removed, and the flat_set/map
containers moved to util/flat_containers.h.
2017-08-21 11:14:55 +10:00
Alex Coyte
d317d75615
character classes: handle \Q\E and utf8
2017-06-21 08:43:44 +10:00
Alex Coyte
a185be5a4f
Treat characters between \Q \E as codepoints in UTF8 mode.
...
fixes github issue #57
2017-06-21 08:43:44 +10:00
Justin Viiret
1ef87c43ee
noncopyable: switch over from boost
2017-04-26 15:18:26 +10:00
Justin Viiret
5dfae12a62
ng: split NGWrapper into NGHolder, ExpressionInfo
...
We now use NGHolder for all graph information, while other expression
properties (report, flag information, etc) go in new class
ExpressionInfo.
2017-04-26 15:18:09 +10:00
Anatoly Burakov
2de6706df2
Adding support for compiling approximate matching patterns
...
Adds new "edit_distance" extparam
2017-04-26 15:11:39 +10:00
Justin Viiret
1245156f44
parser: handle "control verbs" without close paren
2017-04-26 14:59:02 +10:00
Justin Viiret
084596bb5e
parser: check for std::out_of_range from stoul
2017-04-26 14:58:46 +10:00
Justin Viiret
bef6889844
parser: use control_verb parser inline
2017-04-26 14:58:43 +10:00
Justin Viiret
bfc8be5675
parser: use stoul(), not strtol()
2017-04-26 14:57:53 +10:00
Justin Viiret
4def0c8a52
parser: switch to using char* pointers
2017-04-26 14:57:53 +10:00
Justin Viiret
1875d55cf1
parser: add initial parser for control verbs
...
This more reliably handles control verbs like (*UTF8) that can only
happen at the start of the pattern, and allows them in any ordering.
2017-04-26 14:57:46 +10:00
Justin Viiret
cacf07fe9b
prefilter: workaround for \b in UCP and !UTF8 mode
...
For now, just drop the assertion (which will still return a superset of
matches, as per prefiltering semantics).
2017-01-20 09:19:51 +11:00
Justin Viiret
67e450115a
parser: ignore \E that is not preceded by \Q
...
This conforms to PCRE's behaviour, where an isolated \E that is not
preceded by \Q is ignored.
2016-08-10 15:08:01 +10:00
Matthew Barr
cbd115f7fe
Don't shadow names
2016-08-10 15:06:57 +10:00
Alex Coyte
5c5ec905cc
violet: initial implementation
2016-08-10 15:01:08 +10:00
Justin Viiret
97eaea043e
ucp_table: clean up make_caseless
2016-05-18 16:28:22 +10:00
Justin Viiret
f48b8c937b
ucp_table: don't always deref rv of lower_bound
...
Fixes a warning from asan.
2016-05-18 16:28:17 +10:00
Justin Viiret
1bc12139a2
ComponentCondReference: mark ctors explicit
2016-04-20 13:34:53 +10:00
Justin Viiret
4e80d22d79
Use using directives to silence hiding warning
2016-04-20 13:34:53 +10:00
Justin Viiret
e92a20e5fa
ComponentRepeat: remove firsts_cache, precalc code
...
Firsts are easy to compute in ComponentRepeat::first() now.
2016-03-01 11:22:45 +11:00
Justin Viiret
3d049d6de3
ComponentRepeat: wire X{0,N} and (X?){N} the same
2016-03-01 11:22:45 +11:00
Justin Viiret
997c0c9efd
ComponentRepeat: wire R{0,N} as (R{1,N})?
...
Change the way that we wire up the edges in a bounded repeat to avoid
large fan-out from predecessors.
2016-03-01 11:22:45 +11:00
Justin Viiret
205bc1af7f
PCRE includes U+180E in /[:print:]/8W
2015-12-07 09:10:12 +11:00
Justin Viiret
f9b7e806b1
Update defn of class [:punct:] for PCRE 8.38
2015-12-07 09:08:46 +11:00
Justin Viiret
25a01e1c3c
Unify handling of caseless flag in class parser
...
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).
Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
2015-12-07 09:07:37 +11:00
Justin Viiret
bdb7a10034
Fix defn of POSIX graph, print, punct classes
...
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
2015-12-07 09:06:23 +11:00
Justin Viiret
fd19168025
Restore \Q..\E support in character classes
2015-11-18 15:27:05 +11:00
Justin Viiret
9cffa7666f
Refine ComponentClass::class_empty
...
ComponentClass::class_empty should only be used on finalized classes to
determine whether a given class contains any elements; it should not
take the cr_ucp or cps_ucp into account, as they have been folden in by
the finalize call.
Fixes our failure to identify that the pattern /[^\D\d]/8W can never
match.
2015-11-10 14:36:39 +11:00
Justin Viiret
c68bfe05d8
Don't use class_empty in early class parsing
...
Instead, explicitly track whether we're still in the early class parsing
machine.
2015-11-10 14:36:39 +11:00
Justin Viiret
b1f6a539c7
Remove dead ComponentClass::{get,set}FirstChar
2015-11-10 14:36:39 +11:00
Justin Viiret
9a7b912a5d
Rework parser rejection for POSIX collating elems
...
Implement rejection of POSIX collating elements ("[.ch.]" and "[=ch=]"
entirely in the Ragel parser, using the same approach both inside and
ouside character classes.
Fix buggy rejection of [^.ch.], which we should accept as a character
class.
2015-11-10 14:36:39 +11:00
Justin Viiret
4c53bd4641
parser: use 'override' keyword in subclasses
2015-11-10 14:36:38 +11:00
Justin Viiret
1afc591c30
Check for (and throw on) large min repeat
...
We were only checking for large maximum bounds, which meant that we
would attempt to compile A{N,} where N is huge.
2015-10-30 11:28:37 +11:00
Matthew Barr
904e436f11
Initial commit of Hyperscan
2015-10-20 09:13:35 +11:00