From b77c1ef4110f074e3b282dcfb1868062cd32f8a3 Mon Sep 17 00:00:00 2001 From: Justin Viiret Date: Tue, 23 Jan 2018 13:56:22 +1100 Subject: [PATCH] doc: describe tools hscheck, hscollider, hsdump --- doc/dev-reference/tools.rst | 147 ++++++++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+) diff --git a/doc/dev-reference/tools.rst b/doc/dev-reference/tools.rst index d2e7a06e..9c2ce6eb 100644 --- a/doc/dev-reference/tools.rst +++ b/doc/dev-reference/tools.rst @@ -6,6 +6,30 @@ Tools This section describes the set of utilities included with the Hyperscan library. +******************** +Quick Check: hscheck +******************** + +The ``hscheck`` tool allows the user to quickly check whether Hyperscan supports +a group of patterns. If a pattern is rejected by Hyperscan's compiler, the +compile error is provided on standard output. + +For example, given the following three patterns (the last of which contains a +syntax error) in a file called ``/tmp/test``:: + + 1:/foo.*bar/ + 2:/abc|def|ghi/ + 3:/((foo|bar)/ + +... the ``hscheck`` tool will produce the following output:: + + $ bin/hscheck -e /tmp/test + + OK: 1:/foo.*bar/ + OK: 2:/abc|def|ghi/ + FAIL (compile): 3:/((foo|bar)/: Missing close parenthesis for group started at index 0. + SUMMARY: 1 of 3 failed. + ******************** Benchmarker: hsbench ******************** @@ -62,6 +86,129 @@ to complete all of them. using a utility like ``taskset`` to lock the hsbench process to one core and minimize jitter due to the operating system's scheduler. +******************************* +Correctness Testing: hscollider +******************************* + +The ``hscollider`` tool, or Pattern Collider, provides a way to verify +Hyperscan's matching behaviour. It does this by compiling and scanning patterns +(either singly or in groups) against known corpora and comparing the results +against another engine (the "ground truth"). Two sources of ground truth for +comparison are available: + + * The PCRE library (http://pcre.org/). + * An NFA simulation run on Hyperscan's compile-time graph representation. This + is used if PCRE cannot support the pattern or if PCRE execution fails due to + a resource limit. + +Much of Hyperscan's testing infrastructure is built on ``hscollider``, and the +tool is designed to take advantage of multiple cores and provide considerable +flexibility in controlling the test. These options are described in the help +(``hscollider -h``) and include: + + * Testing in streaming, block or vectored mode. + * Testing corpora at different alignments in memory. + * Testing patterns in groups of varying size. + * Manipulating stream state or scratch space between tests. + * Cross-compilation and serialization/deserialization of databases. + * Synthetic generation of corpora given a pattern set. + +Using hscollider to debug a pattern +=================================== + +One common use-case for ``hscollider`` is to determine whether Hyperscan will +match a pattern in the expected location, and whether this accords with PCRE's +behaviour for the same case. + +Here is an example. We put our pattern in a file in Hyperscan's pattern +format:: + + $ cat /tmp/pat + 1:/hatstand.*badgerbrush/ + +We put the corpus to be scanned in another file, with the same numeric +identifier at the start to indicate that it should match pattern 1:: + + $ cat /tmp/corpus + 1:__hatstand__hatstand__badgerbrush_badgerbrush + +Then we can run ``hscollider`` with its verbosity turned up (``-vv``) so that +individual matches are displayed in the output:: + + $ bin/ue2collider -e /tmp/pat -c /tmp/corpus -Z 0 -T 1 -vv + ue2collider: The Pattern Collider Mark II + + Number of threads: 1 (1 scanner, 1 generator) + Expression path: /tmp/pat + Signature files: none + Mode of operation: block mode + UE2 scan alignment: 0 + Corpora read from file: /tmp/corpus + + Running single-pattern/single-compile test for 1 expressions. + + PCRE Match @ (2,45) + PCRE Match @ (2,33) + PCRE Match @ (12,45) + PCRE Match @ (12,33) + UE2 Match @ (0,33) for 1 + UE2 Match @ (0,45) for 1 + Scan call returned 0 + PASSED: id 1, alignment 0, corpus 0 (matched pcre:2, ue2:2) + Thread 0 processed 1 units. + + Summary: + Mode: Single/Block + ========= + Expressions processed: 1 + Corpora processed: 1 + Expressions with failures: 0 + Corpora generation failures: 0 + Compilation failures: pcre:0, ng:0, ue2:0 + Matching failures: pcre:0, ng:0, ue2:0 + Match differences: 0 + No ground truth: 0 + Total match differences: 0 + + Total elapsed time: 0.00522815 secs. + +We can see from this output that both PCRE and Hyperscan find matches ending at +offset 33 and 45, and so ``hscollider`` considers this test case to have +passed. + +(In the example command line above, ``-Z 0`` instructs us to only test at +corpus alignment 0, and ``-T 1`` instructs us to only use one thread.) + +.. note:: In default operation, PCRE produces only one match for a scan, unlike + Hyperscan's automata semantics. The ``hscollider`` tool uses libpcre's + "callout" functionality to match Hyperscan's semantics. + +Running a larger scan test +========================== + +A set of patterns for testing purposes are distributed with Hyperscan, and these +can be tested via ``hscollider`` on an in-tree build. Two CMake targets are +provided to do this easily: + +================================= ===================================== +Make Target Description +================================= ===================================== +``make collide_quick_test`` Tests all patterns in streaming mode. +``make collide_quick_test_block`` Tests all patterns in block mode. +================================= ===================================== + +***************** +Debugging: hsdump +***************** + +When built in debug mode (using the CMake directive ``CMAKE_BUILD_TYPE`` set to +``Debug``), Hyperscan includes support for dumping information about its +internals during pattern compilation with the ``hsdump`` tool. + +This information is mostly of use to Hyperscan developers familiar with the +library's internal structure, but can be used to diagnose issues with patterns +and provide more information in bug reports. + .. _tools_pattern_format: **************