mirror of
https://github.com/VectorCamp/vectorscan.git
synced 2025-06-28 16:41:01 +03:00
117 lines
4.7 KiB
ReStructuredText
117 lines
4.7 KiB
ReStructuredText
.. _tools:
|
|
|
|
#####
|
|
Tools
|
|
#####
|
|
|
|
This section describes the set of utilities included with the Hyperscan library.
|
|
|
|
********************
|
|
Benchmarker: hsbench
|
|
********************
|
|
|
|
The ``hsbench`` tool provides an easy way to measure Hyperscan's performance
|
|
for a particular set of patterns and corpus of data to be scanned.
|
|
|
|
Patterns are supplied in the format described below in
|
|
:ref:`tools_pattern_format`, while the corpus must be provided in the form of a
|
|
`corpus database`: this is a simple SQLite database format intended to allow for
|
|
easy control of how a corpus is broken into blocks and streams.
|
|
|
|
.. note:: A group of Python scripts for constructing corpora databases from
|
|
various input types, such as PCAP network traffic captures or text files, can
|
|
be found in the Hyperscan source tree in ``tools/hsbench/scripts``.
|
|
|
|
Running hsbench
|
|
===============
|
|
|
|
Given a file full of patterns specified with ``-e`` and a corpus database
|
|
specified with ``-c``, ``hsbench`` will perform a single-threaded benchmark and
|
|
produce output like this::
|
|
|
|
$ hsbench -e /tmp/patterns -c /tmp/corpus.db
|
|
|
|
Signatures: /tmp/patterns
|
|
Hyperscan info: Version: 4.3.1 Features: AVX2 Mode: STREAM
|
|
Expression count: 200
|
|
Bytecode size: 342,540 bytes
|
|
Database CRC: 0x6cd6b67c
|
|
Stream state size: 252 bytes
|
|
Scratch size: 18,406 bytes
|
|
Compile time: 0.153 seconds
|
|
Peak heap usage: 78,073,856 bytes
|
|
|
|
Time spent scanning: 0.600 seconds
|
|
Corpus size: 72,138,183 bytes (63,946 blocks in 8,891 streams)
|
|
Scan matches: 81 (0.001 matches/kilobyte)
|
|
Overall block rate: 2,132,004.45 blocks/sec
|
|
Overall throughput: 19,241.10 Mbit/sec
|
|
|
|
By default, the corpus is scanned twenty times, and the overall performance
|
|
reported is computed based the total number of bytes scanned in the time it
|
|
takes to perform all twenty scans. The number of repeats can be changed with the
|
|
``-n`` argument, and the results of each scan will be displayed if the
|
|
``--per-scan`` argument is specified.
|
|
|
|
To benchmark Hyperscan on more than one core, you can supply a list of cores
|
|
with the ``-T`` argument, which will instruct ``hsbench`` to start one
|
|
benchmark thread per core given and compute the throughput from the time taken
|
|
to complete all of them.
|
|
|
|
.. tip:: For single-threaded benchmarks on multi-processor systems, we recommend
|
|
using a utility like ``taskset`` to lock the hsbench process to one core and
|
|
minimize jitter due to the operating system's scheduler.
|
|
|
|
.. _tools_pattern_format:
|
|
|
|
**************
|
|
Pattern Format
|
|
**************
|
|
|
|
All of the Hyperscan tools accept patterns in the same format, read from plain
|
|
text files with one pattern per line. Each line looks like this:
|
|
|
|
* ``<integer id>:/<regex>/<flags>``
|
|
|
|
For example::
|
|
|
|
1:/hatstand.*teakettle/s
|
|
2:/(hatstand|teakettle)/iH
|
|
3:/^.{10,20}hatstand/m
|
|
|
|
The integer ID is the value that will be reported when a match is found by
|
|
Hyperscan and must be unique.
|
|
|
|
The pattern itself is a regular expression in PCRE syntax; see
|
|
:ref:`compilation` for more information on supported features.
|
|
|
|
The flags are single characters that map to Hyperscan flags as follows:
|
|
|
|
========= ================================= ===========
|
|
Character API Flag Description
|
|
========= ================================= ===========
|
|
``i`` :c:member:`HS_FLAG_CASELESS` Case-insensitive matching
|
|
``s`` :c:member:`HS_FLAG_DOTALL` Dot (``.``) will match newlines
|
|
``m`` :c:member:`HS_FLAG_MULTILINE` Multi-line anchoring
|
|
``H`` :c:member:`HS_FLAG_SINGLEMATCH` Report match ID at most once
|
|
``V`` :c:member:`HS_FLAG_ALLOWEMPTY` Allow patterns that can match against empty buffers
|
|
``8`` :c:member:`HS_FLAG_UTF8` UTF-8 mode
|
|
``W`` :c:member:`HS_FLAG_UCP` Unicode property support
|
|
``P`` :c:member:`HS_FLAG_PREFILTER` Prefiltering mode
|
|
``L`` :c:member:`HS_FLAG_SOM_LEFTMOST` Leftmost start of match reporting
|
|
========= ================================= ===========
|
|
|
|
In addition to the set of flags above, :ref:`extparam` can be supplied
|
|
for each pattern. These are supplied after the flags as ``key=value`` pairs
|
|
between braces, separated by commas. For example::
|
|
|
|
1:/hatstand.*teakettle/s{min_offset=50,max_offset=100}
|
|
|
|
All Hyperscan tools will accept a pattern file (or a directory containing
|
|
pattern files) with the ``-e`` argument. If no further arguments constraining
|
|
the pattern set are given, all patterns in those files are used.
|
|
|
|
To select a subset of the patterns, a single ID can be supplied with the ``-z``
|
|
argument, or a file containing a set of IDs can be supplied with the ``-s``
|
|
argument.
|