Literal API: update dev-reference

This commit is contained in:
Hong, Yang A 2019-07-17 23:45:59 +08:00 committed by Chang, Harry
parent 23e5f06594
commit 435cd23823

View File

@ -54,6 +54,75 @@ version of Hyperscan used to scan with it.
Hyperscan provides support for targeting a database at a particular CPU
platform; see :ref:`instr_specialization` for details.
=====================
Compile Pure Literals
=====================
Pure literal is a special case of regular expression. A character sequence is
regarded as a pure literal if and only if each character is read and
interpreted independently. No syntax association happens between any adjacent
characters.
For example, given an expression written as :regexp:`/bc?/`. We could say it is
a regluar expression, with the meaning that character ``b`` followed by nothing
or by one character ``c``. On the other view, we could also say it is a pure
literal expression, with the meaning that this is a character sequence of 3-byte
length, containing characters ``b``, ``c`` and ``?``. In regular case, the
question mark character ``?`` has a particular syntax role called 0-1 quantifier,
which has an syntax association with the character ahead of it. Similar
characters exist in regular grammer like ``[``, ``]``, ``(``, ``)``, ``{``,
``}``, ``-``, ``*``, ``+``, ``\``, ``|``, ``/``, ``:``, ``^``, ``.``, ``$``.
While in pure literal case, all these meta characters lost extra meanings
expect for that they are just common ASCII codes.
Hyperscan is initially designed to process common regualr expressions. It is
hence embedded with a complex parser to do comprehensive regular grammer
interpretion. Particularly, the identification of above meta characters is the
basic step for the interpretion of far more complex regular grammers.
However in real cases, patterns may not always be regualr expressions. They
could just be pure literals. Problem will come if the pure literals contain
regular meta characters. Supposing fed directly into traditional Hyperscan
compile API, all these meta characters will be interpreted in predefined ways,
which is unnecessary and the result is totally out of expectation. To avoid
such misunderstanding by traditional API, users have to preprocess these
literal patterns by converting the meta characters into some other formats:
either by adding a backslash ``\`` before certain meta characters, or by
converting all the characters into a hexadecimal representation.
In ``v5.2.0``, Hyperscan introduces 2 new compile APIs for pure literal patterns:
#. :c:func:`hs_compile_lit`: compiles a single pure literal into a pattern
database.
#. :c:func:`hs_compile_lit_multi`: compiles an array of pure literals into a
pattern database. All of the supplied patterns will be scanned for
concurrently at scan time, with user-supplied identifiers returned when they
match.
These 2 APIs are designed for use cases where all patterns contained in the
target rule set are pure literals. Users can pass the initial pure literal
content directly into these APIs without worrying about writing regular meta
characters in their patterns. No preprocessing work is needed any more.
For new APIs, the ``length`` of each literal pattern is a newly added parameter.
Hyperscan needs to locate the end position of the input expression via clearly
knowing each literal's length, not by simply identifying character ``\0`` of a
string.
Supported flags: :c:member:`HS_FLAG_CASELESS`, :c:member:`HS_FLAG_MULTILINE`,
:c:member:`HS_FLAG_SINGLEMATCH`, :c:member:`HS_FLAG_SOM_LEFTMOST`.
.. note:: We don't support literal compilation API with :ref:`extparam`. And
for runtime implementation, traditional runtime APIs can still be
used to match pure literal patterns.
.. note:: If the target rule set contains at least one regular expression,
please use traditional compile APIs :c:func:`hs_compile`,
:c:func:`hs_compile_multi` and :c:func:`hs_compile_ext_multi`.
The new literal APIs introduced here are designed for rule sets
containing only pure literal expressions.
***************
Pattern Support
***************