131 Commits

Author SHA1 Message Date
Justin Viiret
176c61aeaa rose_build_bytecode: clean up findEdgesByLiteral() 2017-04-26 15:04:31 +10:00
Justin Viiret
6a0dc261a2 rose_build_bytecode: less final_id 2017-04-26 15:04:31 +10:00
Justin Viiret
24ffb156e9 rose: eliminate global final to fragment map 2017-04-26 15:04:31 +10:00
Justin Viiret
454fbf33d5 rose: tidy 2017-04-26 15:04:31 +10:00
Justin Viiret
dc50ab291b container: allow sort_and_unique to have a comparator 2017-04-26 15:04:31 +10:00
Justin Viiret
cea8f452f2 rose: reorganise delay program generation 2017-04-26 15:04:31 +10:00
Justin Viiret
a2d2f7cb95 rose: dedupe anch programs and RECORD_ANCHOREDs 2017-04-26 15:04:31 +10:00
Justin Viiret
75c7f42314 rose: don't emit RECORD_ANCHORED in anchored progs 2017-04-26 15:04:31 +10:00
Justin Viiret
f5dd20e461 rose: rearrange anchored program generation 2017-04-26 15:04:31 +10:00
Justin Viiret
6a945e27fb rose: reduce delay program dep on final_id 2017-04-26 15:04:31 +10:00
Justin Viiret
dc8220648c rose: remove now-unused anchored_base_id 2017-04-26 15:04:30 +10:00
Justin Viiret
c426d2dc7d rose: reduce anchored program dep on final_id
We only need to build anchored programs for cases where a
RECORD_ANCHORED instruction has been generated, and we can key those
directly rather than using final_id.
2017-04-26 15:04:30 +10:00
Justin Viiret
ea8d0bcb1c rose: build fragments directly 2017-04-26 15:04:30 +10:00
Justin Viiret
79512bd5c3 rose: use fragment ids earlier for anchored dfas 2017-04-26 15:04:30 +10:00
Justin Viiret
8b25d83415 rose: write fragment ids into literal_info 2017-04-26 15:04:30 +10:00
Justin Viiret
7bdb327203 rose: use final_ids less in program construction 2017-04-26 14:56:48 +10:00
Justin Viiret
a83b7cb348 move final_id_to_literal into build_context 2017-04-26 14:56:48 +10:00
Justin Viiret
a0260c0362 rose: do fragment group assignment earlier 2017-04-26 14:56:48 +10:00
Justin Viiret
6bf35cb637 rose: make groupByFragment local 2017-04-26 14:49:51 +10:00
Justin Viiret
a5b3bc814f rose: delete RoseEngine::literalCount 2017-04-26 14:49:51 +10:00
Justin Viiret
9550058e75 remove lit program tables from bytecode 2017-04-26 14:49:51 +10:00
Justin Viiret
c2cac5009a tidy up args to builders 2017-04-26 14:46:49 +10:00
Justin Viiret
3ae2fb417e move final_to_frag_map into RoseBuildImpl (for dump code) 2017-04-26 14:46:49 +10:00
Justin Viiret
76f72b6ab4 rose: use program offsets directly in lit tables 2017-04-26 14:46:48 +10:00
Justin Viiret
ac858cd47c rose: build a separate delay rebuild matcher 2017-04-26 14:46:48 +10:00
Alex Coyte
bbd64f98ae allow streams to marked as exhausted in more cases
At stream boundaries, we can mark streams as exhausted if there are no
groups active and there are no other ways to report matches. This allows us
to stop maintaining the history buffer on subsequent stream writes.
Previously, streams were only marked as exhausted if a pure highlander case
reported all patterns or the outfix in a sole outfix case died.
2017-04-26 14:44:53 +10:00
Justin Viiret
c6b2563df6 rose: delete literal_info requires_explode flag 2017-04-26 14:43:28 +10:00
Justin Viiret
f307956584 rose: do not combine fragments which squash groups 2017-04-26 14:41:30 +10:00
Justin Viiret
eb14792a63 rose: group final ids by fragment 2017-04-26 14:41:29 +10:00
Justin Viiret
07a6b6510c rose/hwlm: limit literals to eight bytes
Rework HWLM to work over literals of eight bytes ("medium length"),
doing confirm in the Rose interpreter.
2017-04-26 14:41:29 +10:00
Justin Viiret
5c9c540424 rose: fix up comments referring to CHECK_LITERAL
This instruction is now called CHECK_LONG_LIT.
2017-04-26 14:41:29 +10:00
Matthew Barr
bc2f336d9d Work around for deficiency in C++11/14/17 standard
As explained to us by STL at Microsoft (the author of their
vector), there is a hole in the standard wrt the vector copy
constructor, which always exists even if it won't compile.
2017-04-26 14:41:29 +10:00
Matthew Barr
2214296b7f Convert compile-time code to not require SIMD 2016-12-14 15:29:01 +11:00
Justin Viiret
e271781d95 multibit, fatbit: make _size build-time only
This commit makes mmbit_size() and fatbit_size compile-time only, and
adds a resource limit for very large multibits.
2016-12-14 15:28:54 +11:00
Alex Coyte
e51b6d23b9 introduce Sheng-McClellan hybrid 2016-12-14 15:27:18 +11:00
Matthew Barr
99e14df117 Fix combine2x128 2016-12-02 11:33:48 +11:00
Alex Coyte
e1e9010cac Introduce custom adjacency-list based graph 2016-12-02 11:31:33 +11:00
Justin Viiret
8869dee643 rose: simplify long lit table, add bloom filter
Replaces the original long lit hash table (used in streaming mode) with a
smaller, simpler linear probing approach. Adds a bloom filter in front
of it to reduce time spent on false positives.

Sizing of both the hash table and bloom filter are done based on max
load.
2016-10-28 14:52:45 +11:00
Justin Viiret
68bf473e2e fdr: move long literal handling into Rose
Move the hash table used for long literal support in streaming mode from
FDR to Rose, and introduce new instructions CHECK_LONG_LIT and
CHECK_LONG_LIT_NOCASE for doing literal confirm for long literals.

This simplifies FDR confirm, and guarantees that HWLM matchers will only
be used for literals < 256 bytes long.
2016-10-28 14:52:26 +11:00
Alex Coyte
c94899dd44 allow sets of tops on edges 2016-10-28 14:51:46 +11:00
Xu, Chi
997787bd4b rose: add CHECK_SINGLE_LOOKAROUND instruction
This specialisation is cheaper than the shufti-based variants, so we
prefer it for single character class tests.
2016-10-28 14:47:04 +11:00
Justin Viiret
385f71b44e rose: enable generation of shufti32x16 case 2016-10-28 14:46:37 +11:00
Xu, Chi
04d79629de rose: add shufti-based lookaround instructions
More lookaround specialisations that use the shufti approach.
2016-10-28 14:46:27 +11:00
Justin Viiret
9139123642 rose: move sparse iter cache to RoseEngineBlob
This enables its use for iterators written by instructions.
2016-10-28 14:45:32 +11:00
Justin Viiret
13af3bfb74 rose: decouple build-time program representation
This commit replaces the build-time representation of the Rose
interpreter programs, from a class containing a discriminated union of
the bytecode structures to a class hierarchy of build-time prototypes.

This makes it easier to reason about and manipulate Rose programs during
compilation.
2016-10-28 14:45:15 +11:00
Justin Viiret
f4fa6cd4dd rose: tighten up requirements for catch up
We only need to catch up when there is an actual anchored table, not
merely when there are successors of anchored_root in the Rose graph.
2016-10-28 14:44:20 +11:00
Justin Viiret
3cf4199879 debug: always use %zu in format string for size_t 2016-10-28 14:43:34 +11:00
Justin Viiret
c8868fb9c7 rose: remove CHECK_LIT_MASK instruction 2016-10-28 14:43:33 +11:00
Justin Viiret
4ce306864e rose: use lookarounds to implement benefits masks
This replaces the CHECK_LIT_MASK instruction.
2016-10-28 14:43:33 +11:00
Xu, Chi
b96d5c23d1 rose: add new instruction CHECK_MASK_32
This is a specialisation of the "lookaround" code.
2016-10-28 14:43:33 +11:00