Justin Viiret
7bdb327203
rose: use final_ids less in program construction
2017-04-26 14:56:48 +10:00
Justin Viiret
a83b7cb348
move final_id_to_literal into build_context
2017-04-26 14:56:48 +10:00
Justin Viiret
a0260c0362
rose: do fragment group assignment earlier
2017-04-26 14:56:48 +10:00
Justin Viiret
6bf35cb637
rose: make groupByFragment local
2017-04-26 14:49:51 +10:00
Justin Viiret
a5b3bc814f
rose: delete RoseEngine::literalCount
2017-04-26 14:49:51 +10:00
Justin Viiret
9550058e75
remove lit program tables from bytecode
2017-04-26 14:49:51 +10:00
Justin Viiret
c2cac5009a
tidy up args to builders
2017-04-26 14:46:49 +10:00
Justin Viiret
3ae2fb417e
move final_to_frag_map into RoseBuildImpl (for dump code)
2017-04-26 14:46:49 +10:00
Justin Viiret
76f72b6ab4
rose: use program offsets directly in lit tables
2017-04-26 14:46:48 +10:00
Justin Viiret
ac858cd47c
rose: build a separate delay rebuild matcher
2017-04-26 14:46:48 +10:00
Alex Coyte
bbd64f98ae
allow streams to marked as exhausted in more cases
...
At stream boundaries, we can mark streams as exhausted if there are no
groups active and there are no other ways to report matches. This allows us
to stop maintaining the history buffer on subsequent stream writes.
Previously, streams were only marked as exhausted if a pure highlander case
reported all patterns or the outfix in a sole outfix case died.
2017-04-26 14:44:53 +10:00
Justin Viiret
c6b2563df6
rose: delete literal_info requires_explode flag
2017-04-26 14:43:28 +10:00
Justin Viiret
f307956584
rose: do not combine fragments which squash groups
2017-04-26 14:41:30 +10:00
Justin Viiret
eb14792a63
rose: group final ids by fragment
2017-04-26 14:41:29 +10:00
Justin Viiret
07a6b6510c
rose/hwlm: limit literals to eight bytes
...
Rework HWLM to work over literals of eight bytes ("medium length"),
doing confirm in the Rose interpreter.
2017-04-26 14:41:29 +10:00
Justin Viiret
5c9c540424
rose: fix up comments referring to CHECK_LITERAL
...
This instruction is now called CHECK_LONG_LIT.
2017-04-26 14:41:29 +10:00
Matthew Barr
bc2f336d9d
Work around for deficiency in C++11/14/17 standard
...
As explained to us by STL at Microsoft (the author of their
vector), there is a hole in the standard wrt the vector copy
constructor, which always exists even if it won't compile.
2017-04-26 14:41:29 +10:00
Matthew Barr
2214296b7f
Convert compile-time code to not require SIMD
2016-12-14 15:29:01 +11:00
Justin Viiret
e271781d95
multibit, fatbit: make _size build-time only
...
This commit makes mmbit_size() and fatbit_size compile-time only, and
adds a resource limit for very large multibits.
2016-12-14 15:28:54 +11:00
Alex Coyte
e51b6d23b9
introduce Sheng-McClellan hybrid
2016-12-14 15:27:18 +11:00
Matthew Barr
99e14df117
Fix combine2x128
2016-12-02 11:33:48 +11:00
Alex Coyte
e1e9010cac
Introduce custom adjacency-list based graph
2016-12-02 11:31:33 +11:00
Justin Viiret
8869dee643
rose: simplify long lit table, add bloom filter
...
Replaces the original long lit hash table (used in streaming mode) with a
smaller, simpler linear probing approach. Adds a bloom filter in front
of it to reduce time spent on false positives.
Sizing of both the hash table and bloom filter are done based on max
load.
2016-10-28 14:52:45 +11:00
Justin Viiret
68bf473e2e
fdr: move long literal handling into Rose
...
Move the hash table used for long literal support in streaming mode from
FDR to Rose, and introduce new instructions CHECK_LONG_LIT and
CHECK_LONG_LIT_NOCASE for doing literal confirm for long literals.
This simplifies FDR confirm, and guarantees that HWLM matchers will only
be used for literals < 256 bytes long.
2016-10-28 14:52:26 +11:00
Alex Coyte
c94899dd44
allow sets of tops on edges
2016-10-28 14:51:46 +11:00
Xu, Chi
997787bd4b
rose: add CHECK_SINGLE_LOOKAROUND instruction
...
This specialisation is cheaper than the shufti-based variants, so we
prefer it for single character class tests.
2016-10-28 14:47:04 +11:00
Justin Viiret
385f71b44e
rose: enable generation of shufti32x16 case
2016-10-28 14:46:37 +11:00
Xu, Chi
04d79629de
rose: add shufti-based lookaround instructions
...
More lookaround specialisations that use the shufti approach.
2016-10-28 14:46:27 +11:00
Justin Viiret
9139123642
rose: move sparse iter cache to RoseEngineBlob
...
This enables its use for iterators written by instructions.
2016-10-28 14:45:32 +11:00
Justin Viiret
13af3bfb74
rose: decouple build-time program representation
...
This commit replaces the build-time representation of the Rose
interpreter programs, from a class containing a discriminated union of
the bytecode structures to a class hierarchy of build-time prototypes.
This makes it easier to reason about and manipulate Rose programs during
compilation.
2016-10-28 14:45:15 +11:00
Justin Viiret
f4fa6cd4dd
rose: tighten up requirements for catch up
...
We only need to catch up when there is an actual anchored table, not
merely when there are successors of anchored_root in the Rose graph.
2016-10-28 14:44:20 +11:00
Justin Viiret
3cf4199879
debug: always use %zu in format string for size_t
2016-10-28 14:43:34 +11:00
Justin Viiret
c8868fb9c7
rose: remove CHECK_LIT_MASK instruction
2016-10-28 14:43:33 +11:00
Justin Viiret
4ce306864e
rose: use lookarounds to implement benefits masks
...
This replaces the CHECK_LIT_MASK instruction.
2016-10-28 14:43:33 +11:00
Xu, Chi
b96d5c23d1
rose: add new instruction CHECK_MASK_32
...
This is a specialisation of the "lookaround" code.
2016-10-28 14:43:33 +11:00
Justin Viiret
ae14187462
rose: use min of max_offset in left merges
...
Be more careful with max_offset, since we rely on it ofr ANCH history
cases. Also adds tighter assertions.
2016-08-10 15:12:12 +10:00
Anatoly Burakov
6331da4e29
dfa: adding new Sheng engine
...
A new shuffle-based DFA engine, complete with acceleration and smallwrite.
2016-08-10 15:10:46 +10:00
Matthew Barr
cbd115f7fe
Don't shadow names
2016-08-10 15:06:57 +10:00
Justin Viiret
7f49958824
rose: only write out report programs if in use
...
These programs are only used by output-exposed engines.
2016-08-10 15:05:53 +10:00
Alex Coyte
d574557200
take mask overhang into account for hwlm accel, float min dist
2016-08-10 15:05:19 +10:00
Justin Viiret
9eb349a343
rose: expose smwr builder, tidy up engine build
2016-08-10 14:59:10 +10:00
Justin Viiret
8754cbbd24
rose: use program offset, not final_id, in atable
...
This removes the need to look up the program offset in a table when
handling an anchored literal match.
2016-08-10 14:59:10 +10:00
Justin Viiret
4dbbc4eaa5
rose: add RECORD_ANCHORED instruction to program
...
Moves recordAnchoredLiteralMatch from an unconditional call in the
anchored callback to being driven by a program instruction.
2016-08-10 14:59:10 +10:00
Alex Coyte
981b59fd05
minor eager prefixes improvements
...
- count eager prefixes as always run engine when comparing with smwr
- only check if a prefix is vacuous after adding back literal fragments
2016-08-10 14:59:10 +10:00
Xu, Chi
4d7469392d
rose: add CHECK_BYTE/CHECK_MASK instructions
...
These instructions are specialisations of the "lookaround" code for
performance.
2016-08-10 14:57:48 +10:00
Justin Viiret
3e96cd48ef
rose: sanity check CHECK_BOUNDS instruction
2016-08-10 14:57:36 +10:00
Alex Coyte
3a1429a621
group_weak_end is no longer used
2016-08-10 14:52:56 +10:00
Justin Viiret
cf9e40ae1c
nfa: unify NfaCallback and SomNfaCallback
...
Use just one callback type, with both start and end offsets.
2016-07-08 11:01:56 +10:00
Xiang Wang
9087d59be5
tamarama: add container engine for exclusive nfas
...
Add the new Tamarama engine that acts as a container for infix/suffix
engines that can be proven to run exclusively of one another.
This reduces stream state for pattern sets with many exclusive engines.
2016-07-08 11:01:34 +10:00
Alex Coyte
f166bc5658
allow some prefixes that may squash the literal match to run eagerly
2016-07-08 11:01:34 +10:00