From 6dc1e202b9ea42de6f7728c45a27a5877de666d3 Mon Sep 17 00:00:00 2001 From: Justin Viiret Date: Mon, 31 Oct 2016 09:46:41 +1100 Subject: [PATCH] hsbench: documentation --- doc/dev-reference/index.rst | 1 + doc/dev-reference/intro.rst | 7 +++ doc/dev-reference/tools.rst | 116 ++++++++++++++++++++++++++++++++++++ 3 files changed, 124 insertions(+) create mode 100644 doc/dev-reference/tools.rst diff --git a/doc/dev-reference/index.rst b/doc/dev-reference/index.rst index df4f8916..32f188dd 100644 --- a/doc/dev-reference/index.rst +++ b/doc/dev-reference/index.rst @@ -17,5 +17,6 @@ Hyperscan |version| Developer's Reference Guide runtime serialization performance + tools api_constants api_files diff --git a/doc/dev-reference/intro.rst b/doc/dev-reference/intro.rst index 5f0cc113..58879aef 100644 --- a/doc/dev-reference/intro.rst +++ b/doc/dev-reference/intro.rst @@ -70,6 +70,13 @@ For a given database, Hyperscan provides several guarantees: See :ref:`runtime` for more detail. +***** +Tools +***** + +Some utilities for testing and benchmarking Hyperscan are included with the +library. See :ref:`tools` for more information. + ************ Example Code ************ diff --git a/doc/dev-reference/tools.rst b/doc/dev-reference/tools.rst new file mode 100644 index 00000000..d2e7a06e --- /dev/null +++ b/doc/dev-reference/tools.rst @@ -0,0 +1,116 @@ +.. _tools: + +##### +Tools +##### + +This section describes the set of utilities included with the Hyperscan library. + +******************** +Benchmarker: hsbench +******************** + +The ``hsbench`` tool provides an easy way to measure Hyperscan's performance +for a particular set of patterns and corpus of data to be scanned. + +Patterns are supplied in the format described below in +:ref:`tools_pattern_format`, while the corpus must be provided in the form of a +`corpus database`: this is a simple SQLite database format intended to allow for +easy control of how a corpus is broken into blocks and streams. + +.. note:: A group of Python scripts for constructing corpora databases from + various input types, such as PCAP network traffic captures or text files, can + be found in the Hyperscan source tree in ``tools/hsbench/scripts``. + +Running hsbench +=============== + +Given a file full of patterns specified with ``-e`` and a corpus database +specified with ``-c``, ``hsbench`` will perform a single-threaded benchmark and +produce output like this:: + + $ hsbench -e /tmp/patterns -c /tmp/corpus.db + + Signatures: /tmp/patterns + Hyperscan info: Version: 4.3.1 Features: AVX2 Mode: STREAM + Expression count: 200 + Bytecode size: 342,540 bytes + Database CRC: 0x6cd6b67c + Stream state size: 252 bytes + Scratch size: 18,406 bytes + Compile time: 0.153 seconds + Peak heap usage: 78,073,856 bytes + + Time spent scanning: 0.600 seconds + Corpus size: 72,138,183 bytes (63,946 blocks in 8,891 streams) + Scan matches: 81 (0.001 matches/kilobyte) + Overall block rate: 2,132,004.45 blocks/sec + Overall throughput: 19,241.10 Mbit/sec + +By default, the corpus is scanned twenty times, and the overall performance +reported is computed based the total number of bytes scanned in the time it +takes to perform all twenty scans. The number of repeats can be changed with the +``-n`` argument, and the results of each scan will be displayed if the +``--per-scan`` argument is specified. + +To benchmark Hyperscan on more than one core, you can supply a list of cores +with the ``-T`` argument, which will instruct ``hsbench`` to start one +benchmark thread per core given and compute the throughput from the time taken +to complete all of them. + +.. tip:: For single-threaded benchmarks on multi-processor systems, we recommend + using a utility like ``taskset`` to lock the hsbench process to one core and + minimize jitter due to the operating system's scheduler. + +.. _tools_pattern_format: + +************** +Pattern Format +************** + +All of the Hyperscan tools accept patterns in the same format, read from plain +text files with one pattern per line. Each line looks like this: + +* ``://`` + +For example:: + + 1:/hatstand.*teakettle/s + 2:/(hatstand|teakettle)/iH + 3:/^.{10,20}hatstand/m + +The integer ID is the value that will be reported when a match is found by +Hyperscan and must be unique. + +The pattern itself is a regular expression in PCRE syntax; see +:ref:`compilation` for more information on supported features. + +The flags are single characters that map to Hyperscan flags as follows: + +========= ================================= =========== +Character API Flag Description +========= ================================= =========== +``i`` :c:member:`HS_FLAG_CASELESS` Case-insensitive matching +``s`` :c:member:`HS_FLAG_DOTALL` Dot (``.``) will match newlines +``m`` :c:member:`HS_FLAG_MULTILINE` Multi-line anchoring +``H`` :c:member:`HS_FLAG_SINGLEMATCH` Report match ID at most once +``V`` :c:member:`HS_FLAG_ALLOWEMPTY` Allow patterns that can match against empty buffers +``8`` :c:member:`HS_FLAG_UTF8` UTF-8 mode +``W`` :c:member:`HS_FLAG_UCP` Unicode property support +``P`` :c:member:`HS_FLAG_PREFILTER` Prefiltering mode +``L`` :c:member:`HS_FLAG_SOM_LEFTMOST` Leftmost start of match reporting +========= ================================= =========== + +In addition to the set of flags above, :ref:`extparam` can be supplied +for each pattern. These are supplied after the flags as ``key=value`` pairs +between braces, separated by commas. For example:: + + 1:/hatstand.*teakettle/s{min_offset=50,max_offset=100} + +All Hyperscan tools will accept a pattern file (or a directory containing +pattern files) with the ``-e`` argument. If no further arguments constraining +the pattern set are given, all patterns in those files are used. + +To select a subset of the patterns, a single ID can be supplied with the ``-z`` +argument, or a file containing a set of IDs can be supplied with the ``-s`` +argument.