Previously, we disallowed the merging of all Rose prefixes in block mode
where the literal sets are not identical.
This change allows merging if the prefix graphs to be merged are very
small, as a small performance improvement for cases with lots of tiny
prefixes.
This check is deliberately conservative: graphs must have some common
vertices, and the result of the merge must not give up any
accelerability.
Move report preconditions (bounds, exhaustion, etc) into program
instructions and use a more direct path to the user match callback than
the adaptor functions.
Report handling has been moved to new file src/report.h. Reporting from
EOD now uses the same instructions as normal report handling, rather
than its own.
Jump target tracking in rose_build_bytecode.cpp has been cleaned up.
Make nfaCheckFinalState return MO_HALT_MATCHING when the user instructs
us (via the callback return value) to halt matching. In the caller,
check this value and stop matching if told.
Specifically, we must set build_context::floatingMinLiteralMatchOffset
to 1 when thew anchored table contains multiple DFAs, as they can
produce unordered matches.
This check was already been done, but too late to affect the generation
of ANCHORED_DELAY instructions.
Remove the RoseEngine and stream state pointers frose RoseContext, as
they are also present in core_info.
Unify stream state handing in Rose to always use a char * (we were often
a u8 * for no particularly good reason) and tidy up.
Replace the RoseLiteral structure with more program instructions; now,
instead of each literal ID leading to a RoseLiteral, it simply has a
program to run (and a delay rebuild program).
This commit also makes some other improvements:
* CHECK_STATE instruction, for use instead of a sparse iterator over a
single element.
* Elide some checks (CHECK_LIT_EARLY, ANCHORED_DELAY, etc) when not
needed.
* Flatten PUSH_DELAYED behaviour to one instruction per delayed
literal, rather than the mask/index-list approach used before.
* Simple program cache at compile time for deduplication.
Note that we have to be careful to leave the first lookaround entry in
place, if it's a dot. This should eventually be done with a program
instruction.
During prefilter region replacement, turn regions with very large max
bounds into repeats with inf max bound. This improves compile time and
the likelihood that we will actually be able to build an implementation
for such patterns.