- Change the compile of literal matchers to two passes.
- Reverse the bucket assignment in FDR, bucket with longer literals has
smaller bucket id.
- Squash the buckets of included literals and jump to the the program of
included literals directly from parent literal program without going
through FDR confirm for included iterals.
We only need to build anchored programs for cases where a
RECORD_ANCHORED instruction has been generated, and we can key those
directly rather than using final_id.
Move the hash table used for long literal support in streaming mode from
FDR to Rose, and introduce new instructions CHECK_LONG_LIT and
CHECK_LONG_LIT_NOCASE for doing literal confirm for long literals.
This simplifies FDR confirm, and guarantees that HWLM matchers will only
be used for literals < 256 bytes long.
Stop overloading END as the last Rose interpreter instruction, use new
sentinel LAST_ROSE_INSTRUCTION for that.
This change will also make it easier to add new instructions without
renumbering END and thus changing all generated bytecodes.
This commit replaces the build-time representation of the Rose
interpreter programs, from a class containing a discriminated union of
the bytecode structures to a class hierarchy of build-time prototypes.
This makes it easier to reason about and manipulate Rose programs during
compilation.
The roseRunProgram function had gotten very large for the number of
sites it was being inlined into, with negative effects on performance in
large cases. This change moves it into its own translation unit.
Replace the use of the internal_report structure (for reports from
engines, MPV etc) with the Rose program interpreter.
SOM processing was reworked to use a new som_operation structure that is
embedded in the appropriate instructions.
Previously, the exhaustion vector was a standard bitvector, which
required an expensive memset() call at init for databases with a large
number of exhaustion keys.
Unifies all literal match paths so that the Rose program is used for all
of them. This removes the previous specialised "direct report" and
"multi direct report" paths. Some additional REPORT instruction work was
necessary for this.
Reworked literal construction path at compile time in prep for using
program offsets as literal IDs.
Completely removed the anchored log runtime, which is no longer worth
the extra complexity.