233 Commits

Author SHA1 Message Date
Justin Viiret
eb14792a63 rose: group final ids by fragment 2017-04-26 14:41:29 +10:00
Justin Viiret
07a6b6510c rose/hwlm: limit literals to eight bytes
Rework HWLM to work over literals of eight bytes ("medium length"),
doing confirm in the Rose interpreter.
2017-04-26 14:41:29 +10:00
Justin Viiret
5c9c540424 rose: fix up comments referring to CHECK_LITERAL
This instruction is now called CHECK_LONG_LIT.
2017-04-26 14:41:29 +10:00
Justin Viiret
5061b76901 rose: mark RoseInstrCheckLongLit ctor explit 2017-04-26 14:41:29 +10:00
Justin Viiret
8f1b3c89fa rose: remove no-longer-used convertBadLeaves pass 2017-04-26 14:41:29 +10:00
Matthew Barr
bc2f336d9d Work around for deficiency in C++11/14/17 standard
As explained to us by STL at Microsoft (the author of their
vector), there is a hole in the standard wrt the vector copy
constructor, which always exists even if it won't compile.
2017-04-26 14:41:29 +10:00
Justin Viiret
68bdc800fc dump: render literals as regexes (with comments) 2017-04-26 14:41:29 +10:00
Alex Coyte
f9324febde Ensure the queue structure is initialised in roseEnginesEod(). 2017-03-01 13:05:10 +11:00
Alex Coyte
734eb2ce62 we can only trim lookarounds based on information common to all literals 2017-01-17 11:39:17 +11:00
Matthew Barr
2214296b7f Convert compile-time code to not require SIMD 2016-12-14 15:29:01 +11:00
Justin Viiret
e271781d95 multibit, fatbit: make _size build-time only
This commit makes mmbit_size() and fatbit_size compile-time only, and
adds a resource limit for very large multibits.
2016-12-14 15:28:54 +11:00
Alex Coyte
e51b6d23b9 introduce Sheng-McClellan hybrid 2016-12-14 15:27:18 +11:00
Justin Viiret
ef99ae108f rose_build_merge: correctly merge NFA outfixes
We were not doing our bookkeeping properly for merges where the number
of NFAs was greater than the batch size of 200.
2016-12-02 11:34:52 +11:00
Matthew Barr
99e14df117 Fix combine2x128 2016-12-02 11:33:48 +11:00
Alex Coyte
8ff7a3cdbb correct dump filenames of som rev engines 2016-12-02 11:33:44 +11:00
Alex Coyte
32c826e9c6 have single dump function per engine 2016-12-02 11:32:36 +11:00
Alex Coyte
71ff480b77 nfa_api: remove subtype from dispatch 2016-12-02 11:32:28 +11:00
Alex Coyte
530d84c6f3 allow edge_descriptors to be created from pair<edge_descriptor, bool> 2016-12-02 11:32:20 +11:00
Alex Coyte
e1e9010cac Introduce custom adjacency-list based graph 2016-12-02 11:31:33 +11:00
Justin Viiret
29472c7b71 rose_dump: remove stray newline 2016-12-02 11:28:20 +11:00
Justin Viiret
91a7ce1cda getData256(): data needs to be 32-byte aligned 2016-12-02 11:28:16 +11:00
Alex Coyte
47f53f63a7 simple pass to pick up paths redundant with those from cyclic's succs 2016-12-02 11:24:30 +11:00
Justin Viiret
8cadba0bdd rose: call loadLongLiteralState() earlier
The ll_buf, ll_buf_nocase buffers must be initialised before anyh path
that could lead to storeLongLiteralState().
2016-12-02 11:23:14 +11:00
Justin Viiret
b9650d4fd0 rose: don't unconditionally init ll_buf etc
This is only necessary (and already always done) if there is a long
literal table.
2016-12-02 11:22:27 +11:00
Alex Coyte
592ce06eeb Create combo tops for trigger limexes 2016-12-02 11:22:23 +11:00
Alex Coyte
445cf987a8 remove unused includes 2016-10-28 14:52:56 +11:00
Justin Viiret
8869dee643 rose: simplify long lit table, add bloom filter
Replaces the original long lit hash table (used in streaming mode) with a
smaller, simpler linear probing approach. Adds a bloom filter in front
of it to reduce time spent on false positives.

Sizing of both the hash table and bloom filter are done based on max
load.
2016-10-28 14:52:45 +11:00
Justin Viiret
68bf473e2e fdr: move long literal handling into Rose
Move the hash table used for long literal support in streaming mode from
FDR to Rose, and introduce new instructions CHECK_LONG_LIT and
CHECK_LONG_LIT_NOCASE for doing literal confirm for long literals.

This simplifies FDR confirm, and guarantees that HWLM matchers will only
be used for literals < 256 bytes long.
2016-10-28 14:52:26 +11:00
Alex Coyte
648a3c4824 UE-3025: There is no need to prune tops from non-triggered graphs 2016-10-28 14:52:01 +11:00
Alex Coyte
c94899dd44 allow sets of tops on edges 2016-10-28 14:51:46 +11:00
Matthew Barr
707fe675ea Operator precedence matters 2016-10-28 14:48:46 +11:00
Matthew Barr
7849b9d611 MSVC prefers the attrib at the beginning 2016-10-28 14:48:31 +11:00
Justin Viiret
6e533589bb rose: move END instruction to start of enum
Stop overloading END as the last Rose interpreter instruction, use new
sentinel LAST_ROSE_INSTRUCTION for that.

This change will also make it easier to add new instructions without
renumbering END and thus changing all generated bytecodes.
2016-10-28 14:47:07 +11:00
Xu, Chi
997787bd4b rose: add CHECK_SINGLE_LOOKAROUND instruction
This specialisation is cheaper than the shufti-based variants, so we
prefer it for single character class tests.
2016-10-28 14:47:04 +11:00
Justin Viiret
385f71b44e rose: enable generation of shufti32x16 case 2016-10-28 14:46:37 +11:00
Xu, Chi
04d79629de rose: add shufti-based lookaround instructions
More lookaround specialisations that use the shufti approach.
2016-10-28 14:46:27 +11:00
Justin Viiret
9139123642 rose: move sparse iter cache to RoseEngineBlob
This enables its use for iterators written by instructions.
2016-10-28 14:45:32 +11:00
Justin Viiret
13b6023a18 hash: add hash_all variadic tpl func, use in rose 2016-10-28 14:45:28 +11:00
Justin Viiret
13af3bfb74 rose: decouple build-time program representation
This commit replaces the build-time representation of the Rose
interpreter programs, from a class containing a discriminated union of
the bytecode structures to a class hierarchy of build-time prototypes.

This makes it easier to reason about and manipulate Rose programs during
compilation.
2016-10-28 14:45:15 +11:00
Justin Viiret
f4fa6cd4dd rose: tighten up requirements for catch up
We only need to catch up when there is an actual anchored table, not
merely when there are successors of anchored_root in the Rose graph.
2016-10-28 14:44:20 +11:00
Justin Viiret
3cf4199879 debug: always use %zu in format string for size_t 2016-10-28 14:43:34 +11:00
Justin Viiret
c8868fb9c7 rose: remove CHECK_LIT_MASK instruction 2016-10-28 14:43:33 +11:00
Justin Viiret
4ce306864e rose: use lookarounds to implement benefits masks
This replaces the CHECK_LIT_MASK instruction.
2016-10-28 14:43:33 +11:00
Xu, Chi
b96d5c23d1 rose: add new instruction CHECK_MASK_32
This is a specialisation of the "lookaround" code.
2016-10-28 14:43:33 +11:00
Justin Viiret
8be6c8b2ca rose: don't merge large acyclic suffixes
Check earlier on in mergeSuffixes that we're not proposing to merge
suffixes above our limit from the acyclic merge path.
2016-10-28 14:43:33 +11:00
Alex Coyte
e6c05d5a55 set an appropriate default value for RoleInfo::score
Coverity CID 131843
2016-08-22 16:03:51 +10:00
Justin Viiret
ae14187462 rose: use min of max_offset in left merges
Be more careful with max_offset, since we rely on it ofr ANCH history
cases. Also adds tighter assertions.
2016-08-10 15:12:12 +10:00
Justin Viiret
cec57d7e90 rose: ensure anch small block literals have bounds 2016-08-10 15:12:04 +10:00
Justin Viiret
e03375b644 program_runtime: remove commented-out code 2016-08-10 15:11:15 +10:00
Anatoly Burakov
6331da4e29 dfa: adding new Sheng engine
A new shuffle-based DFA engine, complete with acceleration and smallwrite.
2016-08-10 15:10:46 +10:00