mirror of
https://github.com/VectorCamp/vectorscan.git
synced 2025-06-28 16:41:01 +03:00
Literal API: update dev-reference
This commit is contained in:
parent
23e5f06594
commit
435cd23823
@ -54,6 +54,75 @@ version of Hyperscan used to scan with it.
|
|||||||
Hyperscan provides support for targeting a database at a particular CPU
|
Hyperscan provides support for targeting a database at a particular CPU
|
||||||
platform; see :ref:`instr_specialization` for details.
|
platform; see :ref:`instr_specialization` for details.
|
||||||
|
|
||||||
|
=====================
|
||||||
|
Compile Pure Literals
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Pure literal is a special case of regular expression. A character sequence is
|
||||||
|
regarded as a pure literal if and only if each character is read and
|
||||||
|
interpreted independently. No syntax association happens between any adjacent
|
||||||
|
characters.
|
||||||
|
|
||||||
|
For example, given an expression written as :regexp:`/bc?/`. We could say it is
|
||||||
|
a regluar expression, with the meaning that character ``b`` followed by nothing
|
||||||
|
or by one character ``c``. On the other view, we could also say it is a pure
|
||||||
|
literal expression, with the meaning that this is a character sequence of 3-byte
|
||||||
|
length, containing characters ``b``, ``c`` and ``?``. In regular case, the
|
||||||
|
question mark character ``?`` has a particular syntax role called 0-1 quantifier,
|
||||||
|
which has an syntax association with the character ahead of it. Similar
|
||||||
|
characters exist in regular grammer like ``[``, ``]``, ``(``, ``)``, ``{``,
|
||||||
|
``}``, ``-``, ``*``, ``+``, ``\``, ``|``, ``/``, ``:``, ``^``, ``.``, ``$``.
|
||||||
|
While in pure literal case, all these meta characters lost extra meanings
|
||||||
|
expect for that they are just common ASCII codes.
|
||||||
|
|
||||||
|
Hyperscan is initially designed to process common regualr expressions. It is
|
||||||
|
hence embedded with a complex parser to do comprehensive regular grammer
|
||||||
|
interpretion. Particularly, the identification of above meta characters is the
|
||||||
|
basic step for the interpretion of far more complex regular grammers.
|
||||||
|
|
||||||
|
However in real cases, patterns may not always be regualr expressions. They
|
||||||
|
could just be pure literals. Problem will come if the pure literals contain
|
||||||
|
regular meta characters. Supposing fed directly into traditional Hyperscan
|
||||||
|
compile API, all these meta characters will be interpreted in predefined ways,
|
||||||
|
which is unnecessary and the result is totally out of expectation. To avoid
|
||||||
|
such misunderstanding by traditional API, users have to preprocess these
|
||||||
|
literal patterns by converting the meta characters into some other formats:
|
||||||
|
either by adding a backslash ``\`` before certain meta characters, or by
|
||||||
|
converting all the characters into a hexadecimal representation.
|
||||||
|
|
||||||
|
In ``v5.2.0``, Hyperscan introduces 2 new compile APIs for pure literal patterns:
|
||||||
|
|
||||||
|
#. :c:func:`hs_compile_lit`: compiles a single pure literal into a pattern
|
||||||
|
database.
|
||||||
|
|
||||||
|
#. :c:func:`hs_compile_lit_multi`: compiles an array of pure literals into a
|
||||||
|
pattern database. All of the supplied patterns will be scanned for
|
||||||
|
concurrently at scan time, with user-supplied identifiers returned when they
|
||||||
|
match.
|
||||||
|
|
||||||
|
These 2 APIs are designed for use cases where all patterns contained in the
|
||||||
|
target rule set are pure literals. Users can pass the initial pure literal
|
||||||
|
content directly into these APIs without worrying about writing regular meta
|
||||||
|
characters in their patterns. No preprocessing work is needed any more.
|
||||||
|
|
||||||
|
For new APIs, the ``length`` of each literal pattern is a newly added parameter.
|
||||||
|
Hyperscan needs to locate the end position of the input expression via clearly
|
||||||
|
knowing each literal's length, not by simply identifying character ``\0`` of a
|
||||||
|
string.
|
||||||
|
|
||||||
|
Supported flags: :c:member:`HS_FLAG_CASELESS`, :c:member:`HS_FLAG_MULTILINE`,
|
||||||
|
:c:member:`HS_FLAG_SINGLEMATCH`, :c:member:`HS_FLAG_SOM_LEFTMOST`.
|
||||||
|
|
||||||
|
.. note:: We don't support literal compilation API with :ref:`extparam`. And
|
||||||
|
for runtime implementation, traditional runtime APIs can still be
|
||||||
|
used to match pure literal patterns.
|
||||||
|
|
||||||
|
.. note:: If the target rule set contains at least one regular expression,
|
||||||
|
please use traditional compile APIs :c:func:`hs_compile`,
|
||||||
|
:c:func:`hs_compile_multi` and :c:func:`hs_compile_ext_multi`.
|
||||||
|
The new literal APIs introduced here are designed for rule sets
|
||||||
|
containing only pure literal expressions.
|
||||||
|
|
||||||
***************
|
***************
|
||||||
Pattern Support
|
Pattern Support
|
||||||
***************
|
***************
|
||||||
|
Loading…
x
Reference in New Issue
Block a user