mirror of
https://github.com/VectorCamp/vectorscan.git
synced 2025-06-28 16:41:01 +03:00
doc: describe tools hscheck, hscollider, hsdump
This commit is contained in:
parent
efd76cb5c5
commit
b77c1ef411
@ -6,6 +6,30 @@ Tools
|
|||||||
|
|
||||||
This section describes the set of utilities included with the Hyperscan library.
|
This section describes the set of utilities included with the Hyperscan library.
|
||||||
|
|
||||||
|
********************
|
||||||
|
Quick Check: hscheck
|
||||||
|
********************
|
||||||
|
|
||||||
|
The ``hscheck`` tool allows the user to quickly check whether Hyperscan supports
|
||||||
|
a group of patterns. If a pattern is rejected by Hyperscan's compiler, the
|
||||||
|
compile error is provided on standard output.
|
||||||
|
|
||||||
|
For example, given the following three patterns (the last of which contains a
|
||||||
|
syntax error) in a file called ``/tmp/test``::
|
||||||
|
|
||||||
|
1:/foo.*bar/
|
||||||
|
2:/abc|def|ghi/
|
||||||
|
3:/((foo|bar)/
|
||||||
|
|
||||||
|
... the ``hscheck`` tool will produce the following output::
|
||||||
|
|
||||||
|
$ bin/hscheck -e /tmp/test
|
||||||
|
|
||||||
|
OK: 1:/foo.*bar/
|
||||||
|
OK: 2:/abc|def|ghi/
|
||||||
|
FAIL (compile): 3:/((foo|bar)/: Missing close parenthesis for group started at index 0.
|
||||||
|
SUMMARY: 1 of 3 failed.
|
||||||
|
|
||||||
********************
|
********************
|
||||||
Benchmarker: hsbench
|
Benchmarker: hsbench
|
||||||
********************
|
********************
|
||||||
@ -62,6 +86,129 @@ to complete all of them.
|
|||||||
using a utility like ``taskset`` to lock the hsbench process to one core and
|
using a utility like ``taskset`` to lock the hsbench process to one core and
|
||||||
minimize jitter due to the operating system's scheduler.
|
minimize jitter due to the operating system's scheduler.
|
||||||
|
|
||||||
|
*******************************
|
||||||
|
Correctness Testing: hscollider
|
||||||
|
*******************************
|
||||||
|
|
||||||
|
The ``hscollider`` tool, or Pattern Collider, provides a way to verify
|
||||||
|
Hyperscan's matching behaviour. It does this by compiling and scanning patterns
|
||||||
|
(either singly or in groups) against known corpora and comparing the results
|
||||||
|
against another engine (the "ground truth"). Two sources of ground truth for
|
||||||
|
comparison are available:
|
||||||
|
|
||||||
|
* The PCRE library (http://pcre.org/).
|
||||||
|
* An NFA simulation run on Hyperscan's compile-time graph representation. This
|
||||||
|
is used if PCRE cannot support the pattern or if PCRE execution fails due to
|
||||||
|
a resource limit.
|
||||||
|
|
||||||
|
Much of Hyperscan's testing infrastructure is built on ``hscollider``, and the
|
||||||
|
tool is designed to take advantage of multiple cores and provide considerable
|
||||||
|
flexibility in controlling the test. These options are described in the help
|
||||||
|
(``hscollider -h``) and include:
|
||||||
|
|
||||||
|
* Testing in streaming, block or vectored mode.
|
||||||
|
* Testing corpora at different alignments in memory.
|
||||||
|
* Testing patterns in groups of varying size.
|
||||||
|
* Manipulating stream state or scratch space between tests.
|
||||||
|
* Cross-compilation and serialization/deserialization of databases.
|
||||||
|
* Synthetic generation of corpora given a pattern set.
|
||||||
|
|
||||||
|
Using hscollider to debug a pattern
|
||||||
|
===================================
|
||||||
|
|
||||||
|
One common use-case for ``hscollider`` is to determine whether Hyperscan will
|
||||||
|
match a pattern in the expected location, and whether this accords with PCRE's
|
||||||
|
behaviour for the same case.
|
||||||
|
|
||||||
|
Here is an example. We put our pattern in a file in Hyperscan's pattern
|
||||||
|
format::
|
||||||
|
|
||||||
|
$ cat /tmp/pat
|
||||||
|
1:/hatstand.*badgerbrush/
|
||||||
|
|
||||||
|
We put the corpus to be scanned in another file, with the same numeric
|
||||||
|
identifier at the start to indicate that it should match pattern 1::
|
||||||
|
|
||||||
|
$ cat /tmp/corpus
|
||||||
|
1:__hatstand__hatstand__badgerbrush_badgerbrush
|
||||||
|
|
||||||
|
Then we can run ``hscollider`` with its verbosity turned up (``-vv``) so that
|
||||||
|
individual matches are displayed in the output::
|
||||||
|
|
||||||
|
$ bin/ue2collider -e /tmp/pat -c /tmp/corpus -Z 0 -T 1 -vv
|
||||||
|
ue2collider: The Pattern Collider Mark II
|
||||||
|
|
||||||
|
Number of threads: 1 (1 scanner, 1 generator)
|
||||||
|
Expression path: /tmp/pat
|
||||||
|
Signature files: none
|
||||||
|
Mode of operation: block mode
|
||||||
|
UE2 scan alignment: 0
|
||||||
|
Corpora read from file: /tmp/corpus
|
||||||
|
|
||||||
|
Running single-pattern/single-compile test for 1 expressions.
|
||||||
|
|
||||||
|
PCRE Match @ (2,45)
|
||||||
|
PCRE Match @ (2,33)
|
||||||
|
PCRE Match @ (12,45)
|
||||||
|
PCRE Match @ (12,33)
|
||||||
|
UE2 Match @ (0,33) for 1
|
||||||
|
UE2 Match @ (0,45) for 1
|
||||||
|
Scan call returned 0
|
||||||
|
PASSED: id 1, alignment 0, corpus 0 (matched pcre:2, ue2:2)
|
||||||
|
Thread 0 processed 1 units.
|
||||||
|
|
||||||
|
Summary:
|
||||||
|
Mode: Single/Block
|
||||||
|
=========
|
||||||
|
Expressions processed: 1
|
||||||
|
Corpora processed: 1
|
||||||
|
Expressions with failures: 0
|
||||||
|
Corpora generation failures: 0
|
||||||
|
Compilation failures: pcre:0, ng:0, ue2:0
|
||||||
|
Matching failures: pcre:0, ng:0, ue2:0
|
||||||
|
Match differences: 0
|
||||||
|
No ground truth: 0
|
||||||
|
Total match differences: 0
|
||||||
|
|
||||||
|
Total elapsed time: 0.00522815 secs.
|
||||||
|
|
||||||
|
We can see from this output that both PCRE and Hyperscan find matches ending at
|
||||||
|
offset 33 and 45, and so ``hscollider`` considers this test case to have
|
||||||
|
passed.
|
||||||
|
|
||||||
|
(In the example command line above, ``-Z 0`` instructs us to only test at
|
||||||
|
corpus alignment 0, and ``-T 1`` instructs us to only use one thread.)
|
||||||
|
|
||||||
|
.. note:: In default operation, PCRE produces only one match for a scan, unlike
|
||||||
|
Hyperscan's automata semantics. The ``hscollider`` tool uses libpcre's
|
||||||
|
"callout" functionality to match Hyperscan's semantics.
|
||||||
|
|
||||||
|
Running a larger scan test
|
||||||
|
==========================
|
||||||
|
|
||||||
|
A set of patterns for testing purposes are distributed with Hyperscan, and these
|
||||||
|
can be tested via ``hscollider`` on an in-tree build. Two CMake targets are
|
||||||
|
provided to do this easily:
|
||||||
|
|
||||||
|
================================= =====================================
|
||||||
|
Make Target Description
|
||||||
|
================================= =====================================
|
||||||
|
``make collide_quick_test`` Tests all patterns in streaming mode.
|
||||||
|
``make collide_quick_test_block`` Tests all patterns in block mode.
|
||||||
|
================================= =====================================
|
||||||
|
|
||||||
|
*****************
|
||||||
|
Debugging: hsdump
|
||||||
|
*****************
|
||||||
|
|
||||||
|
When built in debug mode (using the CMake directive ``CMAKE_BUILD_TYPE`` set to
|
||||||
|
``Debug``), Hyperscan includes support for dumping information about its
|
||||||
|
internals during pattern compilation with the ``hsdump`` tool.
|
||||||
|
|
||||||
|
This information is mostly of use to Hyperscan developers familiar with the
|
||||||
|
library's internal structure, but can be used to diagnose issues with patterns
|
||||||
|
and provide more information in bug reports.
|
||||||
|
|
||||||
.. _tools_pattern_format:
|
.. _tools_pattern_format:
|
||||||
|
|
||||||
**************
|
**************
|
||||||
|
Loading…
x
Reference in New Issue
Block a user