2150 Commits

Author SHA1 Message Date
Anatoly Burakov
47b17ade27 Multibyte shufti runtime 2016-03-01 11:21:39 +11:00
Anatoly Burakov
dd2ec6bdac Multibyte vermicelli runtime 2016-03-01 11:21:39 +11:00
Anatoly Burakov
77ff826bbf Adding bitmatchers 2016-03-01 11:21:39 +11:00
Anatoly Burakov
68f6849687 Adding AVX2 version of truffle 2016-03-01 11:21:39 +11:00
Justin Viiret
abb5a82057 scratch: remove sparse iter state (now unused) 2016-03-01 11:21:23 +11:00
Justin Viiret
5fc4289dbe roseRunProgram: iter state on stack 2016-03-01 11:20:36 +11:00
Justin Viiret
2abc038f1c roseCatchUpLeftfixes: iter state on stack 2016-03-01 11:20:36 +11:00
Justin Viiret
dd692c5d2b roseBlockHasEodWork: iter state on stack 2016-03-01 11:20:36 +11:00
Justin Viiret
09319940bf roseFlushLastByteHistory: iter state on stack 2016-03-01 11:20:36 +11:00
Justin Viiret
b2a76e6e2b roseCheckNfaEod: use sparse iterator for EOD
Rather than checking all active outfix/suffix engines, use a sparse
iterator to check only those engines that accept at EOD.
2016-03-01 11:20:26 +11:00
Justin Viiret
04dfed2602 runtime: hoist broken check in streaming mode 2016-03-01 11:20:22 +11:00
Matthew Barr
b460f47476 Build the tools dir only if the cmake file exists 2016-03-01 11:19:32 +11:00
Justin Viiret
b6508811c0 writeEodProgram: avoid make_move_iterator warning
Avoid an ambiguity between std:: and boost::make_move_iterator on builds
against libc++.
2016-03-01 11:18:17 +11:00
Justin Viiret
b2ebdac642 rose: Extend program to handle literals, iterators
- cleanups
- add sparse iter instructions
- merge "root" and "sparse iter" programs together
- move program execution to new file program_runtime.h
- simplify EOD execution
2016-03-01 11:17:31 +11:00
Justin Viiret
8069e99bee make_disjoint: Remove dead code 2016-03-01 11:17:28 +11:00
Justin Viiret
db4176c13e convertAnchPrefixToBounds: check size of delay_adj
Avoid subtracting delay_adj from a smaller max bound.
2016-03-01 11:16:29 +11:00
Justin Viiret
326abeb3ee Perform an early removeRedundancy call on graph
This allows sibling character classes to be merged together before graph
component splitting is done by calcComponents().

In particular, this transforms (A|a)(B|b)(C|c) into [Aa][Bb][Cc]
earlier.
2016-03-01 11:16:17 +11:00
Justin Viiret
86a52971ca Remove dead code: EdgeSourceStateCompare 2016-03-01 11:16:13 +11:00
Justin Viiret
d67c7583ea rose: Extend the interpreter to handle more work
- Use program for EOD sparse iterator
- Use program for literal sparse iterator
- Eliminate RoseRole, RosePred, RoseVertexProps::role
- Small performance optimizations
2016-03-01 11:16:02 +11:00
Justin Viiret
9cb2233589 rose: Use an interpreter for role runtime
Replace much of the RoseRole structure with an interpreted program,
simplifying the Rose runtime and making it much more flexible.
2016-03-01 11:16:02 +11:00
Alex Coyte
a7d8dafb71 detach the sidecar 2016-03-01 11:13:23 +11:00
Alex Coyte
e065c4d60b make nfaExecCastle0_QR() more efficent
1. Reverse scan for the last escape and only process later events.
2. Only scheck subcastles which may expire for staleness
2016-03-01 11:13:22 +11:00
Alex Coyte
b9c5d65f0e Rework literal overlap checks for merging engines
Also increase the size of chunks we consider merging for castles.
2016-03-01 11:10:24 +11:00
Alex Coyte
05beadf52f Introduce REPEAT_ALWAYS model for {0,} castle repeats
As Castle guards the repeats, no more state is needed for these repeats
2016-03-01 11:10:20 +11:00
Alex Coyte
5e0d10d805 Allow lag on castle infixes to be reduced
Reducing lag allows for castles to be merged more effectively
2016-03-01 11:10:13 +11:00
Alex Coyte
e58786e192 Use add_edge_if_not_present in somMayGoBackwards()
As somMayGoBackwards() operates on a copy of the graph where virtual
starts have been collapsed on to startDs, we need to be careful not to
create parallel edges.
2016-03-01 11:09:49 +11:00
Matthew Barr
0e5c4cbd1d Merge branch develop into master v4.1.0 2015-12-18 14:41:50 +11:00
Matthew Barr
a5944067d4 Bump version number 2015-12-18 14:37:29 +11:00
Justin Viiret
0f2cbb9ffd Small updates to documentation for 4.1 2015-12-18 14:36:53 +11:00
Justin Viiret
2aa6830c88 Add ChangeLog 2015-12-18 14:36:53 +11:00
Xiang Wang
7bcd2b07c9 simplify max clique analysis 2015-12-07 09:38:33 +11:00
Justin Viiret
8c09d054c9 Add per-top findMinWidth etc for NFA graphs 2015-12-07 09:38:32 +11:00
Justin Viiret
748d46c124 CastleProto: track next top explicitly
Repeats may be removed (e.g. by pruning in role aliasing passes)
leaving "holes" in the top map. Track the next top to use explicitly,
rather than using repeats.size().
2015-12-07 09:38:32 +11:00
Justin Viiret
8427d83780 CastleProto: track mapping of reports to tops
This allows us to speed up report-based queries, like dedupe checking.
2015-12-07 09:38:32 +11:00
Justin Viiret
da23e8306a assignDkeys: use flat_set<ReportID>, not set 2015-12-07 09:38:32 +11:00
Justin Viiret
8dac64d1dc findMinWidth, findMaxWidth: width for a given top
Currently only implemented for Castle suffixes.
2015-12-07 09:38:32 +11:00
Justin Viiret
03953f34b1 RoseDedupeAuxImpl: collect unique suffixes first 2015-12-07 09:38:32 +11:00
Justin Viiret
1267922ca7 role aliasing: simplify hashRightRoleProperties
Using the full report set for a suffix as an input to this hash was very
slow at scale.
2015-12-07 09:38:32 +11:00
Justin Viiret
b87590ce9d castle: simplify find_next_top
Tops are no longer sparse in CastleProto, so the linear scan for holes
isn't necessary.
2015-12-07 09:38:32 +11:00
Justin Viiret
15c2980948 Make key 64 bits where large shifts may be used.
This fixes a long-standing issue with large multibit structures.
2015-12-07 09:38:32 +11:00
Justin Viiret
205bc1af7f PCRE includes U+180E in /[:print:]/8W 2015-12-07 09:10:12 +11:00
Justin Viiret
f9b7e806b1 Update defn of class [:punct:] for PCRE 8.38 2015-12-07 09:08:46 +11:00
Justin Viiret
25a01e1c3c Unify handling of caseless flag in class parser
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).

Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
2015-12-07 09:07:37 +11:00
Justin Viiret
bdb7a10034 Fix defn of POSIX graph, print, punct classes
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
2015-12-07 09:06:23 +11:00
Mohammad Abdul Awal
313822c157 FDR runtime simplification
Removed static specialisation of domains.
2015-11-20 14:44:43 +11:00
Justin Viiret
abbd548899 ng_execute: update interface to use flat_set
This changes all the execute_graph() interfaces so that instead of
mutating a std::set of vertices, they accept an initial flat_set of
states and return a resultant flat_set of states after execution.

(Note that internally execute_graph() still uses bitsets)

This is both faster and more flexible.
2015-11-18 15:27:17 +11:00
Justin Viiret
fd19168025 Restore \Q..\E support in character classes 2015-11-18 15:27:05 +11:00
Justin Viiret
2a2576e907 Introduce copy_bytes for writing into bytecode
Protects memcpy from nullptr sources, which triggers failures in GCC's
UB sanitizer.
2015-11-18 15:26:16 +11:00
Justin Viiret
cf3ddd9e88 repeatStoreSparseOptimalP: make diff a u32
As delta is a u32, we know diff will always fit within a u32 as well.
Silences a warning from Coverity.
2015-11-18 15:26:11 +11:00
Matthew Barr
f65170da5b cmake: improve build paths for nested builds
If Hyperscan is built as a subproject of another cmake project, it helps to
refer to PROJECT_xx_DIR instead of CMAKE_xx_DIR, etc.
2015-11-10 14:36:39 +11:00