.. include:: .. _compilation: ################## Compiling Patterns ################## ******************* Building a Database ******************* The Hyperscan compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data. The API provides three functions that compile regular expressions into databases: #. :c:func:`hs_compile`: compiles a single expression into a pattern database. #. :c:func:`hs_compile_multi`: compiles an array of expressions into a pattern database. All of the supplied patterns will be scanned for concurrently at scan time, with user-supplied identifiers returned when they match. #. :c:func:`hs_compile_ext_multi`: compiles an array of expressions as above, but allows :ref:`extparam` to be specified for each expression. Compilation allows the Hyperscan library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time. When compiling expressions, a decision needs to be made whether the resulting compiled patterns are to be used in a streaming, block or vectored mode: - **Streaming mode**: the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream. In streaming mode, each stream requires a block of memory to store its state between scan calls. - **Block mode**: the target data is a discrete, contiguous block which can be scanned in one call and does not require state to be retained. - **Vectored mode**: the target data consists of a list of non-contiguous blocks that are available all at once. As for block mode, no retention of state is required. To compile patterns to be used in streaming mode, the ``mode`` parameter of :c:func:`hs_compile` must be set to :c:member:`HS_MODE_STREAM`; similarly, block mode requires the use of :c:member:`HS_MODE_BLOCK` and vectored mode requires the use of :c:member:`HS_MODE_VECTORED`. A pattern database compiled for one mode (streaming, block or vectored) can only be used in that mode. The version of Hyperscan used to produce a compiled pattern database must match the version of Hyperscan used to scan with it. Hyperscan provides support for targeting a database at a particular CPU platform; see :ref:`instr_specialization` for details. *************** Pattern Support *************** Hyperscan supports the pattern syntax used by the PCRE library ("libpcre"), described at . However, not all constructs available in libpcre are supported. The use of unsupported constructs will result in compilation errors. The version of PCRE used to validate Hyperscan's interpretation of this syntax is 8.41. ==================== Supported Constructs ==================== The following regex constructs are supported by Hyperscan: * Literal characters and strings, with all libpcre quoting and character escapes. * Character classes such as :regexp:`.` (dot), :regexp:`[abc]`, and :regexp:`[^abc]`, as well as the predefined character classes :regexp:`\\s`, :regexp:`\\d`, :regexp:`\\w`, :regexp:`\\v`, and :regexp:`\\h` and their negated counterparts (:regexp:`\\S`, :regexp:`\\D`, :regexp:`\\W`, :regexp:`\\V`, and :regexp:`\\H`). * The POSIX named character classes :regexp:`[[:xxx:]]` and negated named character classes :regexp:`[[:^xxx:]]`. * Unicode character properties, such as :regexp:`\\p{L}`, :regexp:`\\P{Sc}`, :regexp:`\\p{Greek}`. * Quantifiers: * Quantifiers such as :regexp:`?`, :regexp:`*` and :regexp:`+` are supported when applied to arbitrary supported sub-expressions. * Bounded repeat qualifiers such as :regexp:`{n}`, :regexp:`{m,n}`, :regexp:`{n,}` are supported with limitations. * For arbitrary repeated sub-patterns: *n* and *m* should be either small or infinite, e.g. :regexp:`(a|b}{4}`, :regexp:`(ab?c?d){4,10}` or :regexp:`(ab(cd)*){6,}`. * For single-character width sub-patterns such as :regexp:`[^\\a]` or :regexp:`.` or :regexp:`x`, nearly all repeat counts are supported, except where repeats are extremely large (maximum bound greater than 32767). Stream states may be very large for large bounded repeats, e.g. :regexp:`a.{2000}b`. Note: such sub-patterns may be considerably cheaper if at the beginning or end of patterns and especially if the :c:member:`HS_FLAG_SINGLEMATCH` flag is on for that pattern. * Lazy modifiers (:regexp:`?` appended to another quantifier, e.g. :regexp:`\\w+?`) are supported but ignored (as Hyperscan reports all matches). * Parenthesization, including the named and unnamed capturing and non-capturing forms. However, capturing is ignored. * Alternation with the :regexp:`|` symbol, as in :regexp:`foo|bar`. * The anchors :regexp:`^`, :regexp:`$`, :regexp:`\\A`, :regexp:`\\Z` and :regexp:`\\z`. * Option modifiers: These allow behaviour to be switched on (with :regexp:`(?