chimera: update dev-reference

2025-11-16 01:12:15 +03:00 · 2018-06-27 10:21:50 -04:00
parent c8ec0d0ec2
commit 746d1eafe5
4 changed files with 337 additions and 1 deletions
--- a/doc/dev-reference/chimera.rst
+++ b/doc/dev-reference/chimera.rst
@@ -0,0 +1,333 @@
 .. _chimera:
 #######
 Chimera
 #######
 This section describes Chimera library.
 ************
 Introduction
 ************
 Chimera is a software regular expression matching engine that is a hybrid of
 Hyperscan and PCRE. The design goals of Chimera are to fully support PCRE
 syntax as well as to take advantage of the high performance nature of Hyperscan.
 Chimera inherits the design guideline of Hyperscan with C APIs for compilation
 and scanning.
 The Chimera API itself is composed of two major components:
 ===========
 Compilation
 ===========
 These functions take a group of regular expressions, along with identifiers and
 option flags, and compile them into an immutable database that can be used by
 the Chimera scanning API. This compilation process performs considerable
 analysis and optimization work in order to build a database that will match
 the given expressions efficiently.
 See :ref:`chcompile` for more details
 ========
 Scanning
 ========
 Once a Chimera database has been created, it can be used to scan data in memory.
 Chimera only supports block mode in which we scan a single contiguous block in
 memory.
 Matches are delivered to the application via a user-supplied callback function
 that is called synchronously for each match.
 For a given database, Chimera provides several guarantees:
 * No memory allocations occur at runtime with the exception of scratch space
  allocation, it should be done ahead of time for performance-critical
  applications:
  - **Scratch space**: temporary memory used for internal data at scan time.
    Structures in scratch space do not persist beyond the end of a single scan
    call.
 * The size of the scratch space required for a given database is fixed and
  determined at database compile time. This means that the memory requirement
  of the application are known ahead of time, and the scratch space can be
  pre-allocated if required for performance reasons.
 * Any pattern that has successfully been compiled by the Chimera compiler can
  be scanned against any input. There could be internal resource limits or
  other limitations caused by PCRE at runtime that could cause a scan call to
  return an error.
 .. note:: Chimera is designed to have the same matching behavior as PCRE,
   including greedy/ungreedy, capturing, etc. Chimera reports both
   **start offset** and **end offset** for each match like PCRE. Different
   from the fashion of reporting all matches in Hyperscan, Chimera only reports
   non-overlapping matches. For example, the pattern :regexp:`/foofoo/` will
   match ``foofoofoofoo`` at offsets (0, 6) and (6, 12).
 .. note:: Since Chimera is a hybrid of Hyperscan and PCRE in order to support
   full PCRE syntax, there will be extra performance overhead compared to
   Hyperscan-only solution. Please always use Hyperscan for better performance
   unless you must need full PCRE syntax support.
 See :ref:`chruntime` for more details
 ************
 Requirements
 ************
 The PCRE library (http://pcre.org/) version 8.41 is required for Chimera.
 .. note:: Since Chimera needs to reference PCRE internal function, please place PCRE source
   directory under Hyperscan root directory in order to build Chimera.
 Beside this, both hardware and software requirements of Chimera are the same to Hyperscan.
 See :ref:`hardware` and :ref:`software` for more details.
 .. note:: Building Hyperscan will automatically generate Chimera library.
   Currently only static library is supported for Chimera, so please
   use static build type when configure CMake build options.
 .. _chcompile:
 ******************
 Compiling Patterns
 ******************
 ===================
 Building a Database
 ===================
 The Chimera compiler API accepts regular expressions and converts them into a
 compiled pattern database that can then be used to scan data.
 The API provides two functions that compile regular expressions into
 databases:
 #. :c:func:`ch_compile`: compiles a single expression into a pattern database.
 #. :c:func:`ch_compile_multi`: compiles an array of expressions into a pattern
   database. All of the supplied patterns will be scanned for concurrently at
   scan time, with user-supplied identifiers returned when they match.
 #. :c:func:`ch_compile_ext_multi`: compiles an array of expressions as above,
   but allows PCRE match limits to be specified for each expression.
 Compilation allows the Chimera library to analyze the given pattern(s) and
 pre-determine how to scan for these patterns in an optimized fashion using
 Hyperscan and PCRE.
 ===============
 Pattern Support
 ===============
 Chimera fully supports the pattern syntax used by the PCRE library ("libpcre"),
 described at <http://www.pcre.org/>.The version of PCRE used to validate
 Chimera's interpretation of this syntax is 8.41.
 =========
 Semantics
 =========
 Chimera supports the exact same semantics of PCRE library. Moreover, it supports
 multiple simultaneous pattern matching like Hyperscan and the multiple matches
 will be reported in order by end offset.
 .. _chruntime:
 *********************
 Scanning for Patterns
 *********************
 Chimera provides scan function with ``ch_scan``.
 ================
 Handling Matches
 ================
 ``ch_scan`` will call a user-supplied callback function when a match
 is found. This function has the following signature:
  .. doxygentypedef:: ch_match_event_handler
       :outline:
       :no-link:
 The *id* argument will be set to the identifier for the matching expression
 provided at compile time, and the *from* argument will be set to the
 start-offset of the match the *to* argument will be set to the end-offset
 of the match. The *captured* stores offsets of entire pattern match as well as
 captured subexpressions. The *size* will be set to the number of valid entries in
 the *captured*.
 The match callback function has the capability to continue or halt scanning
 by returning different values.
 See :c:type:`ch_match_event_handler` for more information.
 =======================
 Handling Runtime Errors
 =======================
 ``ch_scan`` will call a user-supplied callback function when a runtime error
 occurs in libpcre. This function has the following signature:
  .. doxygentypedef:: ch_error_event_handler
       :outline:
       :no-link:
 The *id* argument will be set to the identifier for the matching expression
 provided at compile time.
 The match callback function has the capability to either halt scanning or
 continue scanning for the next pattern.
 See :c:type:`ch_error_event_handler` for more information.
 =============
 Scratch Space
 =============
 While scanning data, Chimera needs a small amount of temporary memory to store
 on-the-fly internal data. This amount is unfortunately too large to fit on the
 stack, particularly for embedded applications, and allocating memory dynamically
 is too expensive, so a pre-allocated "scratch" space must be provided to the
 scanning functions.
 The function :c:func:`ch_alloc_scratch` allocates a large enough region of
 scratch space to support a given database. If the application uses multiple
 databases, only a single scratch region is necessary: in this case, calling
 :c:func:`ch_alloc_scratch` on each database (with the same ``scratch`` pointer)
 will ensure that the scratch space is large enough to support scanning against
 any of the given databases.
 While the Chimera library is re-entrant, the use of scratch spaces is not.
 For example, if by design it is deemed necessary to run recursive or nested
 scanning (say, from the match callback function), then an additional scratch
 space is required for that context.
 In the absence of recursive scanning, only one such space is required per thread
 and can (and indeed should) be allocated before data scanning is to commence.
 In a scenario where a set of expressions are compiled by a single "master"
 thread and data will be scanned by multiple "worker" threads, the convenience
 function :c:func:`ch_clone_scratch` allows multiple copies of an existing
 scratch space to be made for each thread (rather than forcing the caller to pass
 all the compiled databases through :c:func:`ch_alloc_scratch` multiple times).
 For example:
 .. code-block:: c
    ch_error_t err;
    ch_scratch_t *scratch_prototype = NULL;
    err = ch_alloc_scratch(db, &scratch_prototype);
    if (err != CH_SUCCESS) {
        printf("ch_alloc_scratch failed!");
        exit(1);
    }
    ch_scratch_t *scratch_thread1 = NULL;
    ch_scratch_t *scratch_thread2 = NULL;
    err = ch_clone_scratch(scratch_prototype, &scratch_thread1);
    if (err != CH_SUCCESS) {
        printf("ch_clone_scratch failed!");
        exit(1);
    }
    err = ch_clone_scratch(scratch_prototype, &scratch_thread2);
    if (err != CH_SUCCESS) {
        printf("ch_clone_scratch failed!");
        exit(1);
    }
    ch_free_scratch(scratch_prototype);
    /* Now two threads can both scan against database db,
       each with its own scratch space. */
 =================
 Custom Allocators
 =================
 By default, structures used by Chimera at runtime (scratch space, etc) are
 allocated with the default system allocators, usually
 ``malloc()`` and ``free()``.
 The Chimera API provides a facility for changing this behaviour to support
 applications that use custom memory allocators.
 These functions are:
 - :c:func:`ch_set_database_allocator`, which sets the allocate and free functions
  used for compiled pattern databases.
 - :c:func:`ch_set_scratch_allocator`, which sets the allocate and free
  functions used for scratch space.
 - :c:func:`ch_set_misc_allocator`, which sets the allocate and free functions
  used for miscellaneous data, such as compile error structures and
  informational strings.
 The :c:func:`ch_set_allocator` function can be used to set all of the custom
 allocators to the same allocate/free pair.
 ************************
 API Reference: Constants
 ************************
 ===========
 Error Codes
 ===========
 .. doxygengroup:: CH_ERROR
   :content-only:
   :no-link:
 =============
 Pattern flags
 =============
 .. doxygengroup:: CH_PATTERN_FLAG
   :content-only:
   :no-link:
 ==================
 Compile mode flags
 ==================
 .. doxygengroup:: CH_MODE_FLAG
   :content-only:
   :no-link:
 ********************
 API Reference: Files
 ********************
 ==========
 File: ch.h
 ==========
 .. doxygenfile:: ch.h
 =================
 File: ch_common.h
 =================
 .. doxygenfile:: ch_common.h
 ==================
 File: ch_compile.h
 ==================
 .. doxygenfile:: ch_compile.h
 ==================
 File: ch_runtime.h
 ==================
 .. doxygenfile:: ch_runtime.h
--- a/doc/dev-reference/getting_started.rst
+++ b/doc/dev-reference/getting_started.rst
@@ -50,6 +50,8 @@ Very Quick Start
 Requirements
 ************
 .. _hardware:
 Hardware
 ========
--- a/doc/dev-reference/hyperscan.doxyfile.in
+++ b/doc/dev-reference/hyperscan.doxyfile.in
@@ -758,7 +758,7 @@ WARN_LOGFILE           =
 # spaces.
 # Note: If this tag is empty the current directory is searched.
-INPUT                  = @CMAKE_SOURCE_DIR@/src/hs.h @CMAKE_SOURCE_DIR@/src/hs_common.h @CMAKE_SOURCE_DIR@/src/hs_compile.h @CMAKE_SOURCE_DIR@/src/hs_runtime.h
+INPUT                  = @CMAKE_SOURCE_DIR@/src/hs.h @CMAKE_SOURCE_DIR@/src/hs_common.h @CMAKE_SOURCE_DIR@/src/hs_compile.h @CMAKE_SOURCE_DIR@/src/hs_runtime.h  @CMAKE_SOURCE_DIR@/chimera/ch.h @CMAKE_SOURCE_DIR@/chimera/ch_common.h @CMAKE_SOURCE_DIR@/chimera/ch_compile.h @CMAKE_SOURCE_DIR@/chimera/ch_runtime.h
 # This tag can be used to specify the character encoding of the source files
 # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
--- a/doc/dev-reference/index.rst
+++ b/doc/dev-reference/index.rst
@@ -20,3 +20,4 @@ Hyperscan |version| Developer's Reference Guide
   tools
   api_constants
   api_files
   chimera