55 Commits

Author SHA1 Message Date
Hong, Yang A
23e5f06594 add new Literal API for pure literal expressions:
Design compile time api hs_compile_lit() and hs_compile_lit_multi()
to handle pure literal pattern sets. Corresponding option --literal-on
is added for hyperscan testing suites. Extended parameters and part of
flags are not supported for this api.
2019-08-13 14:51:38 +08:00
Hong, Yang A
f68723a606 literal matching: separate path for pure literal patterns 2019-01-21 09:59:22 +08:00
Alex Coyte
a1fdc3afcf dedupeLeftfixesVariableLag: refactor, more blockmode deduping 2017-09-18 13:29:34 +10:00
Justin Viiret
9cf66b6ac9 util: switch from Boost to std::unordered set/map
This commit replaces the ue2::unordered_{set,map} types with their STL
versions, with some new hashing utilities in util/hash.h. The new types
ue2_unordered_set<T> and ue2_unordered_map<Key, T> default to using the
ue2_hasher.

The header util/ue2_containers.h has been removed, and the flat_set/map
containers moved to util/flat_containers.h.
2017-08-21 11:14:55 +10:00
Justin Viiret
8b9328fe9e rose: replace RoseLiteralMap use of bimap
This apoproach is simpler and more efficient for cases with large
numbers of literals.
2017-05-30 13:58:59 +10:00
Justin Viiret
a75b2ba2e5 rose: remove hasLiteral() 2017-05-30 13:58:59 +10:00
Alex Coyte
bb29aeb298 rose: shift program construction functions to rose_build_program 2017-05-30 13:58:32 +10:00
Justin Viiret
813f1e3fb9 rose: use bytecode_ptr<RoseEngine> 2017-04-26 15:19:36 +10:00
Justin Viiret
cf82924a39 depth: make constructor explicit 2017-04-26 15:19:19 +10:00
Alex Coyte
37cb93e60f rose_build: reduce size/scope of context objects 2017-04-26 15:19:01 +10:00
Justin Viiret
bf93c993cb rose: remove final_id 2017-04-26 15:04:31 +10:00
Justin Viiret
24ffb156e9 rose: eliminate global final to fragment map 2017-04-26 15:04:31 +10:00
Justin Viiret
6a945e27fb rose: reduce delay program dep on final_id 2017-04-26 15:04:31 +10:00
Justin Viiret
dc8220648c rose: remove now-unused anchored_base_id 2017-04-26 15:04:30 +10:00
Justin Viiret
79512bd5c3 rose: use fragment ids earlier for anchored dfas 2017-04-26 15:04:30 +10:00
Justin Viiret
8b25d83415 rose: write fragment ids into literal_info 2017-04-26 15:04:30 +10:00
Justin Viiret
1eae677d73 rose_build_impl: fix header guard 2017-04-26 15:04:30 +10:00
Justin Viiret
d43e9d838f rose: delete dead code for cloneVertex 2017-04-26 14:56:49 +10:00
Justin Viiret
a4af801dd1 rose: define invalid value for program offset 2017-04-26 14:56:49 +10:00
Justin Viiret
a83b7cb348 move final_id_to_literal into build_context 2017-04-26 14:56:48 +10:00
Justin Viiret
a0260c0362 rose: do fragment group assignment earlier 2017-04-26 14:56:48 +10:00
Justin Viiret
6bf35cb637 rose: make groupByFragment local 2017-04-26 14:49:51 +10:00
Justin Viiret
3ae2fb417e move final_to_frag_map into RoseBuildImpl (for dump code) 2017-04-26 14:46:49 +10:00
Justin Viiret
76f72b6ab4 rose: use program offsets directly in lit tables 2017-04-26 14:46:48 +10:00
Alex Coyte
bbd64f98ae allow streams to marked as exhausted in more cases
At stream boundaries, we can mark streams as exhausted if there are no
groups active and there are no other ways to report matches. This allows us
to stop maintaining the history buffer on subsequent stream writes.
Previously, streams were only marked as exhausted if a pure highlander case
reported all patterns or the outfix in a sole outfix case died.
2017-04-26 14:44:53 +10:00
Alex Coyte
7767651b59 shift all early_dfa creation logic to ng_violet/ng_rose 2017-04-26 14:44:29 +10:00
Alex Coyte
512c049493 shift early_dfa construction earlier 2017-04-26 14:44:03 +10:00
Justin Viiret
c6b2563df6 rose: delete literal_info requires_explode flag 2017-04-26 14:43:28 +10:00
Justin Viiret
eb14792a63 rose: group final ids by fragment 2017-04-26 14:41:29 +10:00
Justin Viiret
07a6b6510c rose/hwlm: limit literals to eight bytes
Rework HWLM to work over literals of eight bytes ("medium length"),
doing confirm in the Rose interpreter.
2017-04-26 14:41:29 +10:00
Alex Coyte
e1e9010cac Introduce custom adjacency-list based graph 2016-12-02 11:31:33 +11:00
Justin Viiret
68bf473e2e fdr: move long literal handling into Rose
Move the hash table used for long literal support in streaming mode from
FDR to Rose, and introduce new instructions CHECK_LONG_LIT and
CHECK_LONG_LIT_NOCASE for doing literal confirm for long literals.

This simplifies FDR confirm, and guarantees that HWLM matchers will only
be used for literals < 256 bytes long.
2016-10-28 14:52:26 +11:00
Alex Coyte
c94899dd44 allow sets of tops on edges 2016-10-28 14:51:46 +11:00
Matthew Barr
151810b4fc Older gcc doesn't like shadowing the function 2016-08-10 15:07:11 +10:00
Alex Coyte
d574557200 take mask overhang into account for hwlm accel, float min dist 2016-08-10 15:05:19 +10:00
Justin Viiret
9eb349a343 rose: expose smwr builder, tidy up engine build 2016-08-10 14:59:10 +10:00
Alex Coyte
3a1429a621 group_weak_end is no longer used 2016-08-10 14:52:56 +10:00
Xiang Wang
9087d59be5 tamarama: add container engine for exclusive nfas
Add the new Tamarama engine that acts as a container for infix/suffix
engines that can be proven to run exclusively of one another.

This reduces stream state for pattern sets with many exclusive engines.
2016-07-08 11:01:34 +10:00
Alex Coyte
f166bc5658 allow some prefixes that may squash the literal match to run eagerly 2016-07-08 11:01:34 +10:00
Justin Viiret
7690881f85 rose: make assignGroupsToLiterals a free function 2016-07-08 10:47:08 +10:00
Justin Viiret
89dbbe6c53 rose: make assignGroupsToRoles a free function 2016-07-08 10:47:08 +10:00
Justin Viiret
9b7eca5400 rose: dump leftfix/suffix queue indices 2016-07-08 10:44:56 +10:00
Justin Viiret
319d47ae4f Remove OutfixInfo::chained (which meant "is MPV") 2016-04-20 13:34:57 +10:00
Justin Viiret
32c866a8f9 OutfixInfo: use boost::variant for engines 2016-04-20 13:34:57 +10:00
Justin Viiret
fa27025bcb Wrap MPV puffettes in a struct 2016-04-20 13:34:57 +10:00
Justin Viiret
b093616aff Rose build: move HWLM build code to own file
To reduce the size of rose_build_bytecode.cpp a little, move the code
that deals with HWLM literal tables into its own new file.
2016-04-20 13:34:54 +10:00
Justin Viiret
67b9784dae Rose: use program for all literal matches
Unifies all literal match paths so that the Rose program is used for all
of them. This removes the previous specialised "direct report" and
"multi direct report" paths. Some additional REPORT instruction work was
necessary for this.

Reworked literal construction path at compile time in prep for using
program offsets as literal IDs.

Completely removed the anchored log runtime, which is no longer worth
the extra complexity.
2016-04-20 13:34:54 +10:00
Justin Viiret
cc5db61686 Rose: allow DR literals to share vertices 2016-03-01 11:36:09 +11:00
Justin Viiret
10cda4cc33 Rose: Move all literal operations into program
Replace the RoseLiteral structure with more program instructions; now,
instead of each literal ID leading to a RoseLiteral, it simply has a
program to run (and a delay rebuild program).

This commit also makes some other improvements:

 * CHECK_STATE instruction, for use instead of a sparse iterator over a
   single element.
 * Elide some checks (CHECK_LIT_EARLY, ANCHORED_DELAY, etc) when not
   needed.
 * Flatten PUSH_DELAYED behaviour to one instruction per delayed
   literal, rather than the mask/index-list approach used before.
 * Simple program cache at compile time for deduplication.
2016-03-01 11:23:56 +11:00
Justin Viiret
48c9d7c381 Remove use of depth from Rose entirely 2016-03-01 11:23:11 +11:00