From 6dc1e202b9ea42de6f7728c45a27a5877de666d3 Mon Sep 17 00:00:00 2001
From: Justin Viiret <justin.viiret@intel.com>
Date: Mon, 31 Oct 2016 09:46:41 +1100
Subject: [PATCH] hsbench: documentation

---
 doc/dev-reference/index.rst |   1 +
 doc/dev-reference/intro.rst |   7 +++
 doc/dev-reference/tools.rst | 116 ++++++++++++++++++++++++++++++++++++
 3 files changed, 124 insertions(+)
 create mode 100644 doc/dev-reference/tools.rst

diff --git a/doc/dev-reference/index.rst b/doc/dev-reference/index.rst
index df4f8916..32f188dd 100644
--- a/doc/dev-reference/index.rst
+++ b/doc/dev-reference/index.rst
@@ -17,5 +17,6 @@ Hyperscan |version| Developer's Reference Guide
    runtime
    serialization
    performance
+   tools
    api_constants
    api_files
diff --git a/doc/dev-reference/intro.rst b/doc/dev-reference/intro.rst
index 5f0cc113..58879aef 100644
--- a/doc/dev-reference/intro.rst
+++ b/doc/dev-reference/intro.rst
@@ -70,6 +70,13 @@ For a given database, Hyperscan provides several guarantees:
 
 See :ref:`runtime` for more detail.
 
+*****
+Tools
+*****
+
+Some utilities for testing and benchmarking Hyperscan are included with the
+library. See :ref:`tools` for more information.
+
 ************
 Example Code
 ************
diff --git a/doc/dev-reference/tools.rst b/doc/dev-reference/tools.rst
new file mode 100644
index 00000000..d2e7a06e
--- /dev/null
+++ b/doc/dev-reference/tools.rst
@@ -0,0 +1,116 @@
+.. _tools:
+
+#####
+Tools
+#####
+
+This section describes the set of utilities included with the Hyperscan library.
+
+********************
+Benchmarker: hsbench
+********************
+
+The ``hsbench`` tool provides an easy way to measure Hyperscan's performance
+for a particular set of patterns and corpus of data to be scanned.
+
+Patterns are supplied in the format described below in
+:ref:`tools_pattern_format`, while the corpus must be provided in the form of a
+`corpus database`: this is a simple SQLite database format intended to allow for
+easy control of how a corpus is broken into blocks and streams.
+
+.. note:: A group of Python scripts for constructing corpora databases from
+   various input types, such as PCAP network traffic captures or text files, can
+   be found in the Hyperscan source tree in ``tools/hsbench/scripts``.
+
+Running hsbench
+===============
+
+Given a file full of patterns specified with ``-e`` and a corpus database
+specified with ``-c``, ``hsbench`` will perform a single-threaded benchmark and
+produce output like this::
+
+    $ hsbench -e /tmp/patterns -c /tmp/corpus.db
+
+    Signatures:        /tmp/patterns
+    Hyperscan info:    Version: 4.3.1 Features:  AVX2 Mode: STREAM
+    Expression count:  200
+    Bytecode size:     342,540 bytes
+    Database CRC:      0x6cd6b67c
+    Stream state size: 252 bytes
+    Scratch size:      18,406 bytes
+    Compile time:      0.153 seconds
+    Peak heap usage:   78,073,856 bytes
+
+    Time spent scanning:     0.600 seconds
+    Corpus size:             72,138,183 bytes (63,946 blocks in 8,891 streams)
+    Scan matches:            81 (0.001 matches/kilobyte)
+    Overall block rate:      2,132,004.45 blocks/sec
+    Overall throughput:      19,241.10 Mbit/sec
+
+By default, the corpus is scanned twenty times, and the overall performance
+reported is computed based the total number of bytes scanned in the time it
+takes to perform all twenty scans. The number of repeats can be changed with the
+``-n`` argument, and the results of each scan will be displayed if the
+``--per-scan`` argument is specified.
+
+To benchmark Hyperscan on more than one core, you can supply a list of cores
+with the ``-T`` argument, which will instruct ``hsbench`` to start one
+benchmark thread per core given and compute the throughput from the time taken
+to complete all of them.
+
+.. tip:: For single-threaded benchmarks on multi-processor systems, we recommend
+   using a utility like ``taskset`` to lock the hsbench process to one core and
+   minimize jitter due to the operating system's scheduler.
+
+.. _tools_pattern_format:
+
+**************
+Pattern Format
+**************
+
+All of the Hyperscan tools accept patterns in the same format, read from plain
+text files with one pattern per line. Each line looks like this:
+
+* ``<integer id>:/<regex>/<flags>``
+
+For example::
+
+    1:/hatstand.*teakettle/s
+    2:/(hatstand|teakettle)/iH
+    3:/^.{10,20}hatstand/m
+
+The integer ID is the value that will be reported when a match is found by
+Hyperscan and must be unique.
+
+The pattern itself is a regular expression in PCRE syntax; see
+:ref:`compilation` for more information on supported features.
+
+The flags are single characters that map to Hyperscan flags as follows:
+
+=========   =================================    ===========
+Character   API Flag                             Description
+=========   =================================    ===========
+``i``       :c:member:`HS_FLAG_CASELESS`         Case-insensitive matching
+``s``       :c:member:`HS_FLAG_DOTALL`           Dot (``.``) will match newlines
+``m``       :c:member:`HS_FLAG_MULTILINE`        Multi-line anchoring
+``H``       :c:member:`HS_FLAG_SINGLEMATCH`      Report match ID at most once
+``V``       :c:member:`HS_FLAG_ALLOWEMPTY`       Allow patterns that can match against empty buffers
+``8``       :c:member:`HS_FLAG_UTF8`             UTF-8 mode
+``W``       :c:member:`HS_FLAG_UCP`              Unicode property support
+``P``       :c:member:`HS_FLAG_PREFILTER`        Prefiltering mode
+``L``       :c:member:`HS_FLAG_SOM_LEFTMOST`     Leftmost start of match reporting
+=========   =================================    ===========
+
+In addition to the set of flags above, :ref:`extparam` can be supplied
+for each pattern. These are supplied after the flags as ``key=value`` pairs
+between braces, separated by commas. For example::
+
+    1:/hatstand.*teakettle/s{min_offset=50,max_offset=100}
+
+All Hyperscan tools will accept a pattern file (or a directory containing
+pattern files) with the ``-e`` argument. If no further arguments constraining
+the pattern set are given, all patterns in those files are used.
+
+To select a subset of the patterns, a single ID can be supplied with the ``-z``
+argument, or a file containing a set of IDs can be supplied with the ``-s``
+argument.