mirror of
https://github.com/VectorCamp/vectorscan.git
synced 2025-06-28 16:41:01 +03:00
The generated documentation continues to refer to Hyperscan despite the project now being VectorScan. Lets replace many of the Hyperscan references with Vectorscan. At the same time, lets resync the documentation here with the vectorscan readme. This updates the supported platforms/compilers and build options. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
243 lines
9.9 KiB
ReStructuredText
243 lines
9.9 KiB
ReStructuredText
.. _runtime:
|
|
|
|
#####################
|
|
Scanning for Patterns
|
|
#####################
|
|
|
|
Vectorscan provides three different scanning modes, each with its own scan
|
|
function beginning with ``hs_scan``. In addition, streaming mode has a number
|
|
of other API functions for managing stream state.
|
|
|
|
****************
|
|
Handling Matches
|
|
****************
|
|
|
|
All of these functions will call a user-supplied callback function when a match
|
|
is found. This function has the following signature:
|
|
|
|
.. doxygentypedef:: match_event_handler
|
|
:outline:
|
|
:no-link:
|
|
|
|
The *id* argument will be set to the identifier for the matching expression
|
|
provided at compile time, and the *to* argument will be set to the end-offset
|
|
of the match. If SOM was requested for the pattern (see :ref:`som`), the
|
|
*from* argument will be set to the leftmost possible start-offset for the match.
|
|
|
|
The match callback function has the capability to halt scanning
|
|
by returning a non-zero value.
|
|
|
|
See :c:type:`match_event_handler` for more information.
|
|
|
|
**************
|
|
Streaming Mode
|
|
**************
|
|
|
|
The core of the Vectorscan streaming runtime API consists of functions to open,
|
|
scan, and close Vectorscan data streams:
|
|
|
|
* :c:func:`hs_open_stream`: allocates and initializes a new stream for scanning.
|
|
|
|
* :c:func:`hs_scan_stream`: scans a block of data in a given stream, raising
|
|
matches as they are detected.
|
|
|
|
* :c:func:`hs_close_stream`: completes scanning of a given stream (raising any
|
|
matches that occur at the end of the stream) and frees the stream state. After
|
|
a call to :c:func:`hs_close_stream`, the stream handle is invalid and should
|
|
not be used again for any purpose.
|
|
|
|
Any matches detected in the data as it is scanned are returned to the calling
|
|
application via a function pointer callback.
|
|
|
|
The match callback function has the capability to halt scanning of the current
|
|
data stream by returning a non-zero value. In streaming mode, the result of
|
|
this is that the stream is then left in a state where no more data can be
|
|
scanned, and any subsequent calls to :c:func:`hs_scan_stream` for that stream
|
|
will return immediately with :c:member:`HS_SCAN_TERMINATED`. The caller must
|
|
still call :c:func:`hs_close_stream` to complete the clean-up process for that
|
|
stream.
|
|
|
|
Streams exist in the Vectorscan library so that pattern matching state can be
|
|
maintained across multiple blocks of target data -- without maintaining this
|
|
state, it would not be possible to detect patterns that span these blocks of
|
|
data. This, however, does come at the cost of requiring an amount of storage
|
|
per-stream (the size of this storage is fixed at compile time), and a slight
|
|
performance penalty in some cases to manage the state.
|
|
|
|
While Vectorscan does always support a strict ordering of multiple matches,
|
|
streaming matches will not be delivered at offsets before the current stream
|
|
write, with the exception of zero-width asserts, where constructs such as
|
|
:regexp:`\\b` and :regexp:`$` can cause a match on the final character of a
|
|
stream write to be delayed until the next stream write or stream close
|
|
operation.
|
|
|
|
=================
|
|
Stream Management
|
|
=================
|
|
|
|
In addition to :c:func:`hs_open_stream`, :c:func:`hs_scan_stream`, and
|
|
:c:func:`hs_close_stream`, the Vectorscan API provides a number of other
|
|
functions for the management of streams:
|
|
|
|
* :c:func:`hs_reset_stream`: resets a stream to its initial state; this is
|
|
equivalent to calling :c:func:`hs_close_stream` but will not free the memory
|
|
used for stream state.
|
|
|
|
* :c:func:`hs_copy_stream`: constructs a (newly allocated) duplicate of a
|
|
stream.
|
|
|
|
* :c:func:`hs_reset_and_copy_stream`: constructs a duplicate of a stream into
|
|
another, resetting the destination stream first. This call avoids the
|
|
allocation done by :c:func:`hs_copy_stream`.
|
|
|
|
==================
|
|
Stream Compression
|
|
==================
|
|
|
|
A stream object is allocated as a fixed size region of memory which has been
|
|
sized to ensure that no memory allocations are required during scan
|
|
operations. When the system is under memory pressure, it may be useful to reduce
|
|
the memory consumed by streams that are not expected to be used soon. The
|
|
Vectorscan API provides calls for translating a stream to and from a compressed
|
|
representation for this purpose. The compressed representation differs from the
|
|
full stream object as it does not reserve space for components which are not
|
|
required given the current stream state. The Vectorscan API functions for this
|
|
functionality are:
|
|
|
|
* :c:func:`hs_compress_stream`: fills the provided buffer with a compressed
|
|
representation of the stream and returns the number of bytes consumed by the
|
|
compressed representation. If the buffer is not large enough to hold the
|
|
compressed representation, :c:member:`HS_INSUFFICIENT_SPACE` is returned along
|
|
with the required size. This call does not modify the original stream in any
|
|
way: it may still be written to with :c:func:`hs_scan_stream`, used as part of
|
|
the various reset calls to reinitialise its state, or
|
|
:c:func:`hs_close_stream` may be called to free its resources.
|
|
|
|
* :c:func:`hs_expand_stream`: creates a new stream based on a buffer containing
|
|
a compressed representation.
|
|
|
|
* :c:func:`hs_reset_and_expand_stream`: constructs a stream based on a buffer
|
|
containing a compressed representation on top of an existing stream, resetting
|
|
the existing stream first. This call avoids the allocation done by
|
|
:c:func:`hs_expand_stream`.
|
|
|
|
Note: it is not recommended to use stream compression between every call to scan
|
|
for performance reasons as it takes time to convert between the compressed
|
|
representation and a standard stream.
|
|
|
|
|
|
**********
|
|
Block Mode
|
|
**********
|
|
|
|
The block mode runtime API consists of a single function: :c:func:`hs_scan`. Using
|
|
the compiled patterns this function identifies matches in the target data,
|
|
using a function pointer callback to communicate with the application.
|
|
|
|
This single :c:func:`hs_scan` function is essentially equivalent to calling
|
|
:c:func:`hs_open_stream`, making a single call to :c:func:`hs_scan_stream`, and
|
|
then :c:func:`hs_close_stream`, except that block mode operation does not
|
|
incur all the stream related overhead.
|
|
|
|
*************
|
|
Vectored Mode
|
|
*************
|
|
|
|
The vectored mode runtime API, like the block mode API, consists of a single
|
|
function: :c:func:`hs_scan_vector`. This function accepts an array of data
|
|
pointers and lengths, facilitating the scanning in sequence of a set of data
|
|
blocks that are not contiguous in memory.
|
|
|
|
From the caller's perspective, this mode will produce the same matches as if
|
|
the set of data blocks were (a) scanned in sequence with a series of streaming
|
|
mode scans, or (b) copied in sequence into a single block of memory and then
|
|
scanned in block mode.
|
|
|
|
*************
|
|
Scratch Space
|
|
*************
|
|
|
|
While scanning data, Vectorscan needs a small amount of temporary memory to store
|
|
on-the-fly internal data. This amount is unfortunately too large to fit on the
|
|
stack, particularly for embedded applications, and allocating memory dynamically
|
|
is too expensive, so a pre-allocated "scratch" space must be provided to the
|
|
scanning functions.
|
|
|
|
The function :c:func:`hs_alloc_scratch` allocates a large enough region of
|
|
scratch space to support a given database. If the application uses multiple
|
|
databases, only a single scratch region is necessary: in this case, calling
|
|
:c:func:`hs_alloc_scratch` on each database (with the same ``scratch`` pointer)
|
|
will ensure that the scratch space is large enough to support scanning against
|
|
any of the given databases.
|
|
|
|
While the Vectorscan library is re-entrant, the use of scratch spaces is not.
|
|
For example, if by design it is deemed necessary to run recursive or nested
|
|
scanning (say, from the match callback function), then an additional scratch
|
|
space is required for that context.
|
|
|
|
In the absence of recursive scanning, only one such space is required per thread
|
|
and can (and indeed should) be allocated before data scanning is to commence.
|
|
|
|
In a scenario where a set of expressions are compiled by a single "main"
|
|
thread and data will be scanned by multiple "worker" threads, the convenience
|
|
function :c:func:`hs_clone_scratch` allows multiple copies of an existing
|
|
scratch space to be made for each thread (rather than forcing the caller to pass
|
|
all the compiled databases through :c:func:`hs_alloc_scratch` multiple times).
|
|
|
|
For example:
|
|
|
|
.. code-block:: c
|
|
|
|
hs_error_t err;
|
|
hs_scratch_t *scratch_prototype = NULL;
|
|
err = hs_alloc_scratch(db, &scratch_prototype);
|
|
if (err != HS_SUCCESS) {
|
|
printf("hs_alloc_scratch failed!");
|
|
exit(1);
|
|
}
|
|
|
|
hs_scratch_t *scratch_thread1 = NULL;
|
|
hs_scratch_t *scratch_thread2 = NULL;
|
|
|
|
err = hs_clone_scratch(scratch_prototype, &scratch_thread1);
|
|
if (err != HS_SUCCESS) {
|
|
printf("hs_clone_scratch failed!");
|
|
exit(1);
|
|
}
|
|
err = hs_clone_scratch(scratch_prototype, &scratch_thread2);
|
|
if (err != HS_SUCCESS) {
|
|
printf("hs_clone_scratch failed!");
|
|
exit(1);
|
|
}
|
|
|
|
hs_free_scratch(scratch_prototype);
|
|
|
|
/* Now two threads can both scan against database db,
|
|
each with its own scratch space. */
|
|
|
|
*****************
|
|
Custom Allocators
|
|
*****************
|
|
|
|
By default, structures used by Vectorscan at runtime (scratch space, stream
|
|
state, etc) are allocated with the default system allocators, usually
|
|
``malloc()`` and ``free()``.
|
|
|
|
The Vectorscan API provides a facility for changing this behaviour to support
|
|
applications that use custom memory allocators.
|
|
|
|
These functions are:
|
|
|
|
- :c:func:`hs_set_database_allocator`, which sets the allocate and free functions
|
|
used for compiled pattern databases.
|
|
- :c:func:`hs_set_scratch_allocator`, which sets the allocate and free
|
|
functions used for scratch space.
|
|
- :c:func:`hs_set_stream_allocator`, which sets the allocate and free functions
|
|
used for stream state in streaming mode.
|
|
- :c:func:`hs_set_misc_allocator`, which sets the allocate and free functions
|
|
used for miscellaneous data, such as compile error structures and
|
|
informational strings.
|
|
|
|
The :c:func:`hs_set_allocator` function can be used to set all of the custom
|
|
allocators to the same allocate/free pair.
|