doc: describe tools hscheck, hscollider, hsdump

This commit is contained in:
Justin Viiret 2018-01-23 13:56:22 +11:00 committed by Xiang Wang
parent efd76cb5c5
commit b77c1ef411

View File

@ -6,6 +6,30 @@ Tools
This section describes the set of utilities included with the Hyperscan library.
********************
Quick Check: hscheck
********************
The ``hscheck`` tool allows the user to quickly check whether Hyperscan supports
a group of patterns. If a pattern is rejected by Hyperscan's compiler, the
compile error is provided on standard output.
For example, given the following three patterns (the last of which contains a
syntax error) in a file called ``/tmp/test``::
1:/foo.*bar/
2:/abc|def|ghi/
3:/((foo|bar)/
... the ``hscheck`` tool will produce the following output::
$ bin/hscheck -e /tmp/test
OK: 1:/foo.*bar/
OK: 2:/abc|def|ghi/
FAIL (compile): 3:/((foo|bar)/: Missing close parenthesis for group started at index 0.
SUMMARY: 1 of 3 failed.
********************
Benchmarker: hsbench
********************
@ -62,6 +86,129 @@ to complete all of them.
using a utility like ``taskset`` to lock the hsbench process to one core and
minimize jitter due to the operating system's scheduler.
*******************************
Correctness Testing: hscollider
*******************************
The ``hscollider`` tool, or Pattern Collider, provides a way to verify
Hyperscan's matching behaviour. It does this by compiling and scanning patterns
(either singly or in groups) against known corpora and comparing the results
against another engine (the "ground truth"). Two sources of ground truth for
comparison are available:
* The PCRE library (http://pcre.org/).
* An NFA simulation run on Hyperscan's compile-time graph representation. This
is used if PCRE cannot support the pattern or if PCRE execution fails due to
a resource limit.
Much of Hyperscan's testing infrastructure is built on ``hscollider``, and the
tool is designed to take advantage of multiple cores and provide considerable
flexibility in controlling the test. These options are described in the help
(``hscollider -h``) and include:
* Testing in streaming, block or vectored mode.
* Testing corpora at different alignments in memory.
* Testing patterns in groups of varying size.
* Manipulating stream state or scratch space between tests.
* Cross-compilation and serialization/deserialization of databases.
* Synthetic generation of corpora given a pattern set.
Using hscollider to debug a pattern
===================================
One common use-case for ``hscollider`` is to determine whether Hyperscan will
match a pattern in the expected location, and whether this accords with PCRE's
behaviour for the same case.
Here is an example. We put our pattern in a file in Hyperscan's pattern
format::
$ cat /tmp/pat
1:/hatstand.*badgerbrush/
We put the corpus to be scanned in another file, with the same numeric
identifier at the start to indicate that it should match pattern 1::
$ cat /tmp/corpus
1:__hatstand__hatstand__badgerbrush_badgerbrush
Then we can run ``hscollider`` with its verbosity turned up (``-vv``) so that
individual matches are displayed in the output::
$ bin/ue2collider -e /tmp/pat -c /tmp/corpus -Z 0 -T 1 -vv
ue2collider: The Pattern Collider Mark II
Number of threads: 1 (1 scanner, 1 generator)
Expression path: /tmp/pat
Signature files: none
Mode of operation: block mode
UE2 scan alignment: 0
Corpora read from file: /tmp/corpus
Running single-pattern/single-compile test for 1 expressions.
PCRE Match @ (2,45)
PCRE Match @ (2,33)
PCRE Match @ (12,45)
PCRE Match @ (12,33)
UE2 Match @ (0,33) for 1
UE2 Match @ (0,45) for 1
Scan call returned 0
PASSED: id 1, alignment 0, corpus 0 (matched pcre:2, ue2:2)
Thread 0 processed 1 units.
Summary:
Mode: Single/Block
=========
Expressions processed: 1
Corpora processed: 1
Expressions with failures: 0
Corpora generation failures: 0
Compilation failures: pcre:0, ng:0, ue2:0
Matching failures: pcre:0, ng:0, ue2:0
Match differences: 0
No ground truth: 0
Total match differences: 0
Total elapsed time: 0.00522815 secs.
We can see from this output that both PCRE and Hyperscan find matches ending at
offset 33 and 45, and so ``hscollider`` considers this test case to have
passed.
(In the example command line above, ``-Z 0`` instructs us to only test at
corpus alignment 0, and ``-T 1`` instructs us to only use one thread.)
.. note:: In default operation, PCRE produces only one match for a scan, unlike
Hyperscan's automata semantics. The ``hscollider`` tool uses libpcre's
"callout" functionality to match Hyperscan's semantics.
Running a larger scan test
==========================
A set of patterns for testing purposes are distributed with Hyperscan, and these
can be tested via ``hscollider`` on an in-tree build. Two CMake targets are
provided to do this easily:
================================= =====================================
Make Target Description
================================= =====================================
``make collide_quick_test`` Tests all patterns in streaming mode.
``make collide_quick_test_block`` Tests all patterns in block mode.
================================= =====================================
*****************
Debugging: hsdump
*****************
When built in debug mode (using the CMake directive ``CMAKE_BUILD_TYPE`` set to
``Debug``), Hyperscan includes support for dumping information about its
internals during pattern compilation with the ``hsdump`` tool.
This information is mostly of use to Hyperscan developers familiar with the
library's internal structure, but can be used to diagnose issues with patterns
and provide more information in bug reports.
.. _tools_pattern_format:
**************