Justin Viiret
2abc038f1c
roseCatchUpLeftfixes: iter state on stack
2016-03-01 11:20:36 +11:00
Justin Viiret
dd692c5d2b
roseBlockHasEodWork: iter state on stack
2016-03-01 11:20:36 +11:00
Justin Viiret
09319940bf
roseFlushLastByteHistory: iter state on stack
2016-03-01 11:20:36 +11:00
Justin Viiret
b2a76e6e2b
roseCheckNfaEod: use sparse iterator for EOD
...
Rather than checking all active outfix/suffix engines, use a sparse
iterator to check only those engines that accept at EOD.
2016-03-01 11:20:26 +11:00
Justin Viiret
04dfed2602
runtime: hoist broken check in streaming mode
2016-03-01 11:20:22 +11:00
Justin Viiret
b6508811c0
writeEodProgram: avoid make_move_iterator warning
...
Avoid an ambiguity between std:: and boost::make_move_iterator on builds
against libc++.
2016-03-01 11:18:17 +11:00
Justin Viiret
b2ebdac642
rose: Extend program to handle literals, iterators
...
- cleanups
- add sparse iter instructions
- merge "root" and "sparse iter" programs together
- move program execution to new file program_runtime.h
- simplify EOD execution
2016-03-01 11:17:31 +11:00
Justin Viiret
8069e99bee
make_disjoint: Remove dead code
2016-03-01 11:17:28 +11:00
Justin Viiret
db4176c13e
convertAnchPrefixToBounds: check size of delay_adj
...
Avoid subtracting delay_adj from a smaller max bound.
2016-03-01 11:16:29 +11:00
Justin Viiret
326abeb3ee
Perform an early removeRedundancy call on graph
...
This allows sibling character classes to be merged together before graph
component splitting is done by calcComponents().
In particular, this transforms (A|a)(B|b)(C|c) into [Aa][Bb][Cc]
earlier.
2016-03-01 11:16:17 +11:00
Justin Viiret
86a52971ca
Remove dead code: EdgeSourceStateCompare
2016-03-01 11:16:13 +11:00
Justin Viiret
d67c7583ea
rose: Extend the interpreter to handle more work
...
- Use program for EOD sparse iterator
- Use program for literal sparse iterator
- Eliminate RoseRole, RosePred, RoseVertexProps::role
- Small performance optimizations
2016-03-01 11:16:02 +11:00
Justin Viiret
9cb2233589
rose: Use an interpreter for role runtime
...
Replace much of the RoseRole structure with an interpreted program,
simplifying the Rose runtime and making it much more flexible.
2016-03-01 11:16:02 +11:00
Alex Coyte
a7d8dafb71
detach the sidecar
2016-03-01 11:13:23 +11:00
Alex Coyte
e065c4d60b
make nfaExecCastle0_QR() more efficent
...
1. Reverse scan for the last escape and only process later events.
2. Only scheck subcastles which may expire for staleness
2016-03-01 11:13:22 +11:00
Alex Coyte
b9c5d65f0e
Rework literal overlap checks for merging engines
...
Also increase the size of chunks we consider merging for castles.
2016-03-01 11:10:24 +11:00
Alex Coyte
05beadf52f
Introduce REPEAT_ALWAYS model for {0,} castle repeats
...
As Castle guards the repeats, no more state is needed for these repeats
2016-03-01 11:10:20 +11:00
Alex Coyte
5e0d10d805
Allow lag on castle infixes to be reduced
...
Reducing lag allows for castles to be merged more effectively
2016-03-01 11:10:13 +11:00
Alex Coyte
e58786e192
Use add_edge_if_not_present in somMayGoBackwards()
...
As somMayGoBackwards() operates on a copy of the graph where virtual
starts have been collapsed on to startDs, we need to be careful not to
create parallel edges.
2016-03-01 11:09:49 +11:00
Xiang Wang
7bcd2b07c9
simplify max clique analysis
2015-12-07 09:38:33 +11:00
Justin Viiret
8c09d054c9
Add per-top findMinWidth etc for NFA graphs
2015-12-07 09:38:32 +11:00
Justin Viiret
748d46c124
CastleProto: track next top explicitly
...
Repeats may be removed (e.g. by pruning in role aliasing passes)
leaving "holes" in the top map. Track the next top to use explicitly,
rather than using repeats.size().
2015-12-07 09:38:32 +11:00
Justin Viiret
8427d83780
CastleProto: track mapping of reports to tops
...
This allows us to speed up report-based queries, like dedupe checking.
2015-12-07 09:38:32 +11:00
Justin Viiret
da23e8306a
assignDkeys: use flat_set<ReportID>, not set
2015-12-07 09:38:32 +11:00
Justin Viiret
8dac64d1dc
findMinWidth, findMaxWidth: width for a given top
...
Currently only implemented for Castle suffixes.
2015-12-07 09:38:32 +11:00
Justin Viiret
03953f34b1
RoseDedupeAuxImpl: collect unique suffixes first
2015-12-07 09:38:32 +11:00
Justin Viiret
1267922ca7
role aliasing: simplify hashRightRoleProperties
...
Using the full report set for a suffix as an input to this hash was very
slow at scale.
2015-12-07 09:38:32 +11:00
Justin Viiret
b87590ce9d
castle: simplify find_next_top
...
Tops are no longer sparse in CastleProto, so the linear scan for holes
isn't necessary.
2015-12-07 09:38:32 +11:00
Justin Viiret
15c2980948
Make key 64 bits where large shifts may be used.
...
This fixes a long-standing issue with large multibit structures.
2015-12-07 09:38:32 +11:00
Justin Viiret
205bc1af7f
PCRE includes U+180E in /[:print:]/8W
2015-12-07 09:10:12 +11:00
Justin Viiret
f9b7e806b1
Update defn of class [:punct:] for PCRE 8.38
2015-12-07 09:08:46 +11:00
Justin Viiret
25a01e1c3c
Unify handling of caseless flag in class parser
...
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).
Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
2015-12-07 09:07:37 +11:00
Justin Viiret
bdb7a10034
Fix defn of POSIX graph, print, punct classes
...
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
2015-12-07 09:06:23 +11:00
Mohammad Abdul Awal
313822c157
FDR runtime simplification
...
Removed static specialisation of domains.
2015-11-20 14:44:43 +11:00
Justin Viiret
abbd548899
ng_execute: update interface to use flat_set
...
This changes all the execute_graph() interfaces so that instead of
mutating a std::set of vertices, they accept an initial flat_set of
states and return a resultant flat_set of states after execution.
(Note that internally execute_graph() still uses bitsets)
This is both faster and more flexible.
2015-11-18 15:27:17 +11:00
Justin Viiret
fd19168025
Restore \Q..\E support in character classes
2015-11-18 15:27:05 +11:00
Justin Viiret
2a2576e907
Introduce copy_bytes for writing into bytecode
...
Protects memcpy from nullptr sources, which triggers failures in GCC's
UB sanitizer.
2015-11-18 15:26:16 +11:00
Justin Viiret
cf3ddd9e88
repeatStoreSparseOptimalP: make diff a u32
...
As delta is a u32, we know diff will always fit within a u32 as well.
Silences a warning from Coverity.
2015-11-18 15:26:11 +11:00
Matthew Barr
f65170da5b
cmake: improve build paths for nested builds
...
If Hyperscan is built as a subproject of another cmake project, it helps to
refer to PROJECT_xx_DIR instead of CMAKE_xx_DIR, etc.
2015-11-10 14:36:39 +11:00
Justin Viiret
9cffa7666f
Refine ComponentClass::class_empty
...
ComponentClass::class_empty should only be used on finalized classes to
determine whether a given class contains any elements; it should not
take the cr_ucp or cps_ucp into account, as they have been folden in by
the finalize call.
Fixes our failure to identify that the pattern /[^\D\d]/8W can never
match.
2015-11-10 14:36:39 +11:00
Justin Viiret
c68bfe05d8
Don't use class_empty in early class parsing
...
Instead, explicitly track whether we're still in the early class parsing
machine.
2015-11-10 14:36:39 +11:00
Justin Viiret
b1f6a539c7
Remove dead ComponentClass::{get,set}FirstChar
2015-11-10 14:36:39 +11:00
Justin Viiret
9a7b912a5d
Rework parser rejection for POSIX collating elems
...
Implement rejection of POSIX collating elements ("[.ch.]" and "[=ch=]"
entirely in the Ragel parser, using the same approach both inside and
ouside character classes.
Fix buggy rejection of [^.ch.], which we should accept as a character
class.
2015-11-10 14:36:39 +11:00
Justin Viiret
d9efe07125
depth: correct sign in printf format
2015-11-10 14:36:38 +11:00
Justin Viiret
62776b615b
nfa_api_queue: debug printf format fix
2015-11-10 14:36:38 +11:00
Justin Viiret
863ea1b2b2
mpv_dump: correct hex escapes in printf format
2015-11-10 14:36:38 +11:00
Justin Viiret
ed4a3cdcf1
compare: always use braces for for/if blocks
2015-11-10 14:36:38 +11:00
Justin Viiret
fb834114e5
limex_dump: use 'override' keyword in subclass
2015-11-10 14:36:38 +11:00
Justin Viiret
5805ac193c
NGWrapper: mark dtor with override
2015-11-10 14:36:38 +11:00
Justin Viiret
4c53bd4641
parser: use 'override' keyword in subclasses
2015-11-10 14:36:38 +11:00