Justin Viiret
07a6b6510c
rose/hwlm: limit literals to eight bytes
...
Rework HWLM to work over literals of eight bytes ("medium length"),
doing confirm in the Rose interpreter.
2017-04-26 14:41:29 +10:00
Justin Viiret
5c9c540424
rose: fix up comments referring to CHECK_LITERAL
...
This instruction is now called CHECK_LONG_LIT.
2017-04-26 14:41:29 +10:00
Matthew Barr
bc2f336d9d
Work around for deficiency in C++11/14/17 standard
...
As explained to us by STL at Microsoft (the author of their
vector), there is a hole in the standard wrt the vector copy
constructor, which always exists even if it won't compile.
2017-04-26 14:41:29 +10:00
Matthew Barr
2214296b7f
Convert compile-time code to not require SIMD
2016-12-14 15:29:01 +11:00
Justin Viiret
e271781d95
multibit, fatbit: make _size build-time only
...
This commit makes mmbit_size() and fatbit_size compile-time only, and
adds a resource limit for very large multibits.
2016-12-14 15:28:54 +11:00
Alex Coyte
e51b6d23b9
introduce Sheng-McClellan hybrid
2016-12-14 15:27:18 +11:00
Matthew Barr
99e14df117
Fix combine2x128
2016-12-02 11:33:48 +11:00
Alex Coyte
e1e9010cac
Introduce custom adjacency-list based graph
2016-12-02 11:31:33 +11:00
Justin Viiret
8869dee643
rose: simplify long lit table, add bloom filter
...
Replaces the original long lit hash table (used in streaming mode) with a
smaller, simpler linear probing approach. Adds a bloom filter in front
of it to reduce time spent on false positives.
Sizing of both the hash table and bloom filter are done based on max
load.
2016-10-28 14:52:45 +11:00
Justin Viiret
68bf473e2e
fdr: move long literal handling into Rose
...
Move the hash table used for long literal support in streaming mode from
FDR to Rose, and introduce new instructions CHECK_LONG_LIT and
CHECK_LONG_LIT_NOCASE for doing literal confirm for long literals.
This simplifies FDR confirm, and guarantees that HWLM matchers will only
be used for literals < 256 bytes long.
2016-10-28 14:52:26 +11:00
Alex Coyte
c94899dd44
allow sets of tops on edges
2016-10-28 14:51:46 +11:00
Xu, Chi
997787bd4b
rose: add CHECK_SINGLE_LOOKAROUND instruction
...
This specialisation is cheaper than the shufti-based variants, so we
prefer it for single character class tests.
2016-10-28 14:47:04 +11:00
Justin Viiret
385f71b44e
rose: enable generation of shufti32x16 case
2016-10-28 14:46:37 +11:00
Xu, Chi
04d79629de
rose: add shufti-based lookaround instructions
...
More lookaround specialisations that use the shufti approach.
2016-10-28 14:46:27 +11:00
Justin Viiret
9139123642
rose: move sparse iter cache to RoseEngineBlob
...
This enables its use for iterators written by instructions.
2016-10-28 14:45:32 +11:00
Justin Viiret
13af3bfb74
rose: decouple build-time program representation
...
This commit replaces the build-time representation of the Rose
interpreter programs, from a class containing a discriminated union of
the bytecode structures to a class hierarchy of build-time prototypes.
This makes it easier to reason about and manipulate Rose programs during
compilation.
2016-10-28 14:45:15 +11:00
Justin Viiret
f4fa6cd4dd
rose: tighten up requirements for catch up
...
We only need to catch up when there is an actual anchored table, not
merely when there are successors of anchored_root in the Rose graph.
2016-10-28 14:44:20 +11:00
Justin Viiret
3cf4199879
debug: always use %zu in format string for size_t
2016-10-28 14:43:34 +11:00
Justin Viiret
c8868fb9c7
rose: remove CHECK_LIT_MASK instruction
2016-10-28 14:43:33 +11:00
Justin Viiret
4ce306864e
rose: use lookarounds to implement benefits masks
...
This replaces the CHECK_LIT_MASK instruction.
2016-10-28 14:43:33 +11:00
Xu, Chi
b96d5c23d1
rose: add new instruction CHECK_MASK_32
...
This is a specialisation of the "lookaround" code.
2016-10-28 14:43:33 +11:00
Justin Viiret
ae14187462
rose: use min of max_offset in left merges
...
Be more careful with max_offset, since we rely on it ofr ANCH history
cases. Also adds tighter assertions.
2016-08-10 15:12:12 +10:00
Anatoly Burakov
6331da4e29
dfa: adding new Sheng engine
...
A new shuffle-based DFA engine, complete with acceleration and smallwrite.
2016-08-10 15:10:46 +10:00
Matthew Barr
cbd115f7fe
Don't shadow names
2016-08-10 15:06:57 +10:00
Justin Viiret
7f49958824
rose: only write out report programs if in use
...
These programs are only used by output-exposed engines.
2016-08-10 15:05:53 +10:00
Alex Coyte
d574557200
take mask overhang into account for hwlm accel, float min dist
2016-08-10 15:05:19 +10:00
Justin Viiret
9eb349a343
rose: expose smwr builder, tidy up engine build
2016-08-10 14:59:10 +10:00
Justin Viiret
8754cbbd24
rose: use program offset, not final_id, in atable
...
This removes the need to look up the program offset in a table when
handling an anchored literal match.
2016-08-10 14:59:10 +10:00
Justin Viiret
4dbbc4eaa5
rose: add RECORD_ANCHORED instruction to program
...
Moves recordAnchoredLiteralMatch from an unconditional call in the
anchored callback to being driven by a program instruction.
2016-08-10 14:59:10 +10:00
Alex Coyte
981b59fd05
minor eager prefixes improvements
...
- count eager prefixes as always run engine when comparing with smwr
- only check if a prefix is vacuous after adding back literal fragments
2016-08-10 14:59:10 +10:00
Xu, Chi
4d7469392d
rose: add CHECK_BYTE/CHECK_MASK instructions
...
These instructions are specialisations of the "lookaround" code for
performance.
2016-08-10 14:57:48 +10:00
Justin Viiret
3e96cd48ef
rose: sanity check CHECK_BOUNDS instruction
2016-08-10 14:57:36 +10:00
Alex Coyte
3a1429a621
group_weak_end is no longer used
2016-08-10 14:52:56 +10:00
Justin Viiret
cf9e40ae1c
nfa: unify NfaCallback and SomNfaCallback
...
Use just one callback type, with both start and end offsets.
2016-07-08 11:01:56 +10:00
Xiang Wang
9087d59be5
tamarama: add container engine for exclusive nfas
...
Add the new Tamarama engine that acts as a container for infix/suffix
engines that can be proven to run exclusively of one another.
This reduces stream state for pattern sets with many exclusive engines.
2016-07-08 11:01:34 +10:00
Alex Coyte
f166bc5658
allow some prefixes that may squash the literal match to run eagerly
2016-07-08 11:01:34 +10:00
Alex Coyte
575e8c06dc
only show floating groups to the floating table
2016-07-08 10:59:40 +10:00
Justin Viiret
6239805561
rose: don't build empty sparse iter subprograms
2016-07-08 10:59:40 +10:00
Justin Viiret
cdaf705a87
rose: pick up more prefix->lookaround conversions
2016-07-08 10:57:29 +10:00
Justin Viiret
d3c56b532b
rose build: dedupe hasLastByteHistorySucc func
2016-07-08 10:57:00 +10:00
Justin Viiret
426bfc9cfb
rose_build_bytecode: clean up
2016-07-08 10:55:36 +10:00
Justin Viiret
78e4332a8b
move eod iter program into general eod program
2016-07-08 10:55:36 +10:00
Justin Viiret
39461cc806
eod: move hwlm execution into MATCHER_EOD instr
2016-07-08 10:55:36 +10:00
Justin Viiret
b8f771e824
rose_build_bytecode: tidy up addPredBlocks
2016-07-08 10:55:36 +10:00
Justin Viiret
2761e0105d
eod: more suffix iteration into program
2016-07-08 10:54:07 +10:00
Justin Viiret
9669e0fe94
eod: remove forced sparse iter optimization
2016-07-08 10:54:07 +10:00
Justin Viiret
7a7dff5b70
eod: don't force sparse iter for general prog
2016-07-08 10:54:07 +10:00
Justin Viiret
02595cda1f
eod: consolidate eod anchor programs
2016-07-08 10:54:07 +10:00
Justin Viiret
7a6a476723
eod: move engine checks into ENGINES_EOD instr
2016-07-08 10:54:07 +10:00
Justin Viiret
8e4c68e9df
rose: eagerly report EOD literal matches
...
Where possible, eagerly report a match when a literal that matches at
EOD occurs, rather than setting a state bit and waiting for EOD
processing.
2016-07-08 10:47:33 +10:00