mirror of
https://github.com/VectorCamp/vectorscan.git
synced 2025-06-28 16:41:01 +03:00
hsbench: documentation
This commit is contained in:
parent
b1c57f9f54
commit
6dc1e202b9
@ -17,5 +17,6 @@ Hyperscan |version| Developer's Reference Guide
|
||||
runtime
|
||||
serialization
|
||||
performance
|
||||
tools
|
||||
api_constants
|
||||
api_files
|
||||
|
@ -70,6 +70,13 @@ For a given database, Hyperscan provides several guarantees:
|
||||
|
||||
See :ref:`runtime` for more detail.
|
||||
|
||||
*****
|
||||
Tools
|
||||
*****
|
||||
|
||||
Some utilities for testing and benchmarking Hyperscan are included with the
|
||||
library. See :ref:`tools` for more information.
|
||||
|
||||
************
|
||||
Example Code
|
||||
************
|
||||
|
116
doc/dev-reference/tools.rst
Normal file
116
doc/dev-reference/tools.rst
Normal file
@ -0,0 +1,116 @@
|
||||
.. _tools:
|
||||
|
||||
#####
|
||||
Tools
|
||||
#####
|
||||
|
||||
This section describes the set of utilities included with the Hyperscan library.
|
||||
|
||||
********************
|
||||
Benchmarker: hsbench
|
||||
********************
|
||||
|
||||
The ``hsbench`` tool provides an easy way to measure Hyperscan's performance
|
||||
for a particular set of patterns and corpus of data to be scanned.
|
||||
|
||||
Patterns are supplied in the format described below in
|
||||
:ref:`tools_pattern_format`, while the corpus must be provided in the form of a
|
||||
`corpus database`: this is a simple SQLite database format intended to allow for
|
||||
easy control of how a corpus is broken into blocks and streams.
|
||||
|
||||
.. note:: A group of Python scripts for constructing corpora databases from
|
||||
various input types, such as PCAP network traffic captures or text files, can
|
||||
be found in the Hyperscan source tree in ``tools/hsbench/scripts``.
|
||||
|
||||
Running hsbench
|
||||
===============
|
||||
|
||||
Given a file full of patterns specified with ``-e`` and a corpus database
|
||||
specified with ``-c``, ``hsbench`` will perform a single-threaded benchmark and
|
||||
produce output like this::
|
||||
|
||||
$ hsbench -e /tmp/patterns -c /tmp/corpus.db
|
||||
|
||||
Signatures: /tmp/patterns
|
||||
Hyperscan info: Version: 4.3.1 Features: AVX2 Mode: STREAM
|
||||
Expression count: 200
|
||||
Bytecode size: 342,540 bytes
|
||||
Database CRC: 0x6cd6b67c
|
||||
Stream state size: 252 bytes
|
||||
Scratch size: 18,406 bytes
|
||||
Compile time: 0.153 seconds
|
||||
Peak heap usage: 78,073,856 bytes
|
||||
|
||||
Time spent scanning: 0.600 seconds
|
||||
Corpus size: 72,138,183 bytes (63,946 blocks in 8,891 streams)
|
||||
Scan matches: 81 (0.001 matches/kilobyte)
|
||||
Overall block rate: 2,132,004.45 blocks/sec
|
||||
Overall throughput: 19,241.10 Mbit/sec
|
||||
|
||||
By default, the corpus is scanned twenty times, and the overall performance
|
||||
reported is computed based the total number of bytes scanned in the time it
|
||||
takes to perform all twenty scans. The number of repeats can be changed with the
|
||||
``-n`` argument, and the results of each scan will be displayed if the
|
||||
``--per-scan`` argument is specified.
|
||||
|
||||
To benchmark Hyperscan on more than one core, you can supply a list of cores
|
||||
with the ``-T`` argument, which will instruct ``hsbench`` to start one
|
||||
benchmark thread per core given and compute the throughput from the time taken
|
||||
to complete all of them.
|
||||
|
||||
.. tip:: For single-threaded benchmarks on multi-processor systems, we recommend
|
||||
using a utility like ``taskset`` to lock the hsbench process to one core and
|
||||
minimize jitter due to the operating system's scheduler.
|
||||
|
||||
.. _tools_pattern_format:
|
||||
|
||||
**************
|
||||
Pattern Format
|
||||
**************
|
||||
|
||||
All of the Hyperscan tools accept patterns in the same format, read from plain
|
||||
text files with one pattern per line. Each line looks like this:
|
||||
|
||||
* ``<integer id>:/<regex>/<flags>``
|
||||
|
||||
For example::
|
||||
|
||||
1:/hatstand.*teakettle/s
|
||||
2:/(hatstand|teakettle)/iH
|
||||
3:/^.{10,20}hatstand/m
|
||||
|
||||
The integer ID is the value that will be reported when a match is found by
|
||||
Hyperscan and must be unique.
|
||||
|
||||
The pattern itself is a regular expression in PCRE syntax; see
|
||||
:ref:`compilation` for more information on supported features.
|
||||
|
||||
The flags are single characters that map to Hyperscan flags as follows:
|
||||
|
||||
========= ================================= ===========
|
||||
Character API Flag Description
|
||||
========= ================================= ===========
|
||||
``i`` :c:member:`HS_FLAG_CASELESS` Case-insensitive matching
|
||||
``s`` :c:member:`HS_FLAG_DOTALL` Dot (``.``) will match newlines
|
||||
``m`` :c:member:`HS_FLAG_MULTILINE` Multi-line anchoring
|
||||
``H`` :c:member:`HS_FLAG_SINGLEMATCH` Report match ID at most once
|
||||
``V`` :c:member:`HS_FLAG_ALLOWEMPTY` Allow patterns that can match against empty buffers
|
||||
``8`` :c:member:`HS_FLAG_UTF8` UTF-8 mode
|
||||
``W`` :c:member:`HS_FLAG_UCP` Unicode property support
|
||||
``P`` :c:member:`HS_FLAG_PREFILTER` Prefiltering mode
|
||||
``L`` :c:member:`HS_FLAG_SOM_LEFTMOST` Leftmost start of match reporting
|
||||
========= ================================= ===========
|
||||
|
||||
In addition to the set of flags above, :ref:`extparam` can be supplied
|
||||
for each pattern. These are supplied after the flags as ``key=value`` pairs
|
||||
between braces, separated by commas. For example::
|
||||
|
||||
1:/hatstand.*teakettle/s{min_offset=50,max_offset=100}
|
||||
|
||||
All Hyperscan tools will accept a pattern file (or a directory containing
|
||||
pattern files) with the ``-e`` argument. If no further arguments constraining
|
||||
the pattern set are given, all patterns in those files are used.
|
||||
|
||||
To select a subset of the patterns, a single ID can be supplied with the ``-z``
|
||||
argument, or a file containing a set of IDs can be supplied with the ``-s``
|
||||
argument.
|
Loading…
x
Reference in New Issue
Block a user