.. _tools:

#####
Tools
#####

This section describes the set of utilities included with the Hyperscan library.

********************
Benchmarker: hsbench
********************

The ``hsbench`` tool provides an easy way to measure Hyperscan's performance
for a particular set of patterns and corpus of data to be scanned.

Patterns are supplied in the format described below in
:ref:`tools_pattern_format`, while the corpus must be provided in the form of a
`corpus database`: this is a simple SQLite database format intended to allow for
easy control of how a corpus is broken into blocks and streams.

.. note:: A group of Python scripts for constructing corpora databases from
   various input types, such as PCAP network traffic captures or text files, can
   be found in the Hyperscan source tree in ``tools/hsbench/scripts``.

Running hsbench
===============

Given a file full of patterns specified with ``-e`` and a corpus database
specified with ``-c``, ``hsbench`` will perform a single-threaded benchmark and
produce output like this::

    $ hsbench -e /tmp/patterns -c /tmp/corpus.db

    Signatures:        /tmp/patterns
    Hyperscan info:    Version: 4.3.1 Features:  AVX2 Mode: STREAM
    Expression count:  200
    Bytecode size:     342,540 bytes
    Database CRC:      0x6cd6b67c
    Stream state size: 252 bytes
    Scratch size:      18,406 bytes
    Compile time:      0.153 seconds
    Peak heap usage:   78,073,856 bytes

    Time spent scanning:     0.600 seconds
    Corpus size:             72,138,183 bytes (63,946 blocks in 8,891 streams)
    Scan matches:            81 (0.001 matches/kilobyte)
    Overall block rate:      2,132,004.45 blocks/sec
    Overall throughput:      19,241.10 Mbit/sec

By default, the corpus is scanned twenty times, and the overall performance
reported is computed based the total number of bytes scanned in the time it
takes to perform all twenty scans. The number of repeats can be changed with the
``-n`` argument, and the results of each scan will be displayed if the
``--per-scan`` argument is specified.

To benchmark Hyperscan on more than one core, you can supply a list of cores
with the ``-T`` argument, which will instruct ``hsbench`` to start one
benchmark thread per core given and compute the throughput from the time taken
to complete all of them.

.. tip:: For single-threaded benchmarks on multi-processor systems, we recommend
   using a utility like ``taskset`` to lock the hsbench process to one core and
   minimize jitter due to the operating system's scheduler.

.. _tools_pattern_format:

**************
Pattern Format
**************

All of the Hyperscan tools accept patterns in the same format, read from plain
text files with one pattern per line. Each line looks like this:

* ``<integer id>:/<regex>/<flags>``

For example::

    1:/hatstand.*teakettle/s
    2:/(hatstand|teakettle)/iH
    3:/^.{10,20}hatstand/m

The integer ID is the value that will be reported when a match is found by
Hyperscan and must be unique.

The pattern itself is a regular expression in PCRE syntax; see
:ref:`compilation` for more information on supported features.

The flags are single characters that map to Hyperscan flags as follows:

=========   =================================    ===========
Character   API Flag                             Description
=========   =================================    ===========
``i``       :c:member:`HS_FLAG_CASELESS`         Case-insensitive matching
``s``       :c:member:`HS_FLAG_DOTALL`           Dot (``.``) will match newlines
``m``       :c:member:`HS_FLAG_MULTILINE`        Multi-line anchoring
``H``       :c:member:`HS_FLAG_SINGLEMATCH`      Report match ID at most once
``V``       :c:member:`HS_FLAG_ALLOWEMPTY`       Allow patterns that can match against empty buffers
``8``       :c:member:`HS_FLAG_UTF8`             UTF-8 mode
``W``       :c:member:`HS_FLAG_UCP`              Unicode property support
``P``       :c:member:`HS_FLAG_PREFILTER`        Prefiltering mode
``L``       :c:member:`HS_FLAG_SOM_LEFTMOST`     Leftmost start of match reporting
=========   =================================    ===========

In addition to the set of flags above, :ref:`extparam` can be supplied
for each pattern. These are supplied after the flags as ``key=value`` pairs
between braces, separated by commas. For example::

    1:/hatstand.*teakettle/s{min_offset=50,max_offset=100}

All Hyperscan tools will accept a pattern file (or a directory containing
pattern files) with the ``-e`` argument. If no further arguments constraining
the pattern set are given, all patterns in those files are used.

To select a subset of the patterns, a single ID can be supplied with the ``-z``
argument, or a file containing a set of IDs can be supplied with the ``-s``
argument.