Matthew Barr
a5944067d4
Bump version number
2015-12-18 14:37:29 +11:00
Justin Viiret
0f2cbb9ffd
Small updates to documentation for 4.1
2015-12-18 14:36:53 +11:00
Justin Viiret
2aa6830c88
Add ChangeLog
2015-12-18 14:36:53 +11:00
Xiang Wang
7bcd2b07c9
simplify max clique analysis
2015-12-07 09:38:33 +11:00
Justin Viiret
8c09d054c9
Add per-top findMinWidth etc for NFA graphs
2015-12-07 09:38:32 +11:00
Justin Viiret
748d46c124
CastleProto: track next top explicitly
...
Repeats may be removed (e.g. by pruning in role aliasing passes)
leaving "holes" in the top map. Track the next top to use explicitly,
rather than using repeats.size().
2015-12-07 09:38:32 +11:00
Justin Viiret
8427d83780
CastleProto: track mapping of reports to tops
...
This allows us to speed up report-based queries, like dedupe checking.
2015-12-07 09:38:32 +11:00
Justin Viiret
da23e8306a
assignDkeys: use flat_set<ReportID>, not set
2015-12-07 09:38:32 +11:00
Justin Viiret
8dac64d1dc
findMinWidth, findMaxWidth: width for a given top
...
Currently only implemented for Castle suffixes.
2015-12-07 09:38:32 +11:00
Justin Viiret
03953f34b1
RoseDedupeAuxImpl: collect unique suffixes first
2015-12-07 09:38:32 +11:00
Justin Viiret
1267922ca7
role aliasing: simplify hashRightRoleProperties
...
Using the full report set for a suffix as an input to this hash was very
slow at scale.
2015-12-07 09:38:32 +11:00
Justin Viiret
b87590ce9d
castle: simplify find_next_top
...
Tops are no longer sparse in CastleProto, so the linear scan for holes
isn't necessary.
2015-12-07 09:38:32 +11:00
Justin Viiret
15c2980948
Make key 64 bits where large shifts may be used.
...
This fixes a long-standing issue with large multibit structures.
2015-12-07 09:38:32 +11:00
Justin Viiret
205bc1af7f
PCRE includes U+180E in /[:print:]/8W
2015-12-07 09:10:12 +11:00
Justin Viiret
f9b7e806b1
Update defn of class [:punct:] for PCRE 8.38
2015-12-07 09:08:46 +11:00
Justin Viiret
25a01e1c3c
Unify handling of caseless flag in class parser
...
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).
Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
2015-12-07 09:07:37 +11:00
Justin Viiret
bdb7a10034
Fix defn of POSIX graph, print, punct classes
...
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
2015-12-07 09:06:23 +11:00
Mohammad Abdul Awal
313822c157
FDR runtime simplification
...
Removed static specialisation of domains.
2015-11-20 14:44:43 +11:00
Justin Viiret
abbd548899
ng_execute: update interface to use flat_set
...
This changes all the execute_graph() interfaces so that instead of
mutating a std::set of vertices, they accept an initial flat_set of
states and return a resultant flat_set of states after execution.
(Note that internally execute_graph() still uses bitsets)
This is both faster and more flexible.
2015-11-18 15:27:17 +11:00
Justin Viiret
fd19168025
Restore \Q..\E support in character classes
2015-11-18 15:27:05 +11:00
Justin Viiret
2a2576e907
Introduce copy_bytes for writing into bytecode
...
Protects memcpy from nullptr sources, which triggers failures in GCC's
UB sanitizer.
2015-11-18 15:26:16 +11:00
Justin Viiret
cf3ddd9e88
repeatStoreSparseOptimalP: make diff a u32
...
As delta is a u32, we know diff will always fit within a u32 as well.
Silences a warning from Coverity.
2015-11-18 15:26:11 +11:00
Matthew Barr
f65170da5b
cmake: improve build paths for nested builds
...
If Hyperscan is built as a subproject of another cmake project, it helps to
refer to PROJECT_xx_DIR instead of CMAKE_xx_DIR, etc.
2015-11-10 14:36:39 +11:00
Matthew Barr
b9d3b73ab8
Fix includes to meet our usual guidelines
2015-11-10 14:36:39 +11:00
Justin Viiret
9cffa7666f
Refine ComponentClass::class_empty
...
ComponentClass::class_empty should only be used on finalized classes to
determine whether a given class contains any elements; it should not
take the cr_ucp or cps_ucp into account, as they have been folden in by
the finalize call.
Fixes our failure to identify that the pattern /[^\D\d]/8W can never
match.
2015-11-10 14:36:39 +11:00
Justin Viiret
c68bfe05d8
Don't use class_empty in early class parsing
...
Instead, explicitly track whether we're still in the early class parsing
machine.
2015-11-10 14:36:39 +11:00
Justin Viiret
b1f6a539c7
Remove dead ComponentClass::{get,set}FirstChar
2015-11-10 14:36:39 +11:00
Justin Viiret
9a7b912a5d
Rework parser rejection for POSIX collating elems
...
Implement rejection of POSIX collating elements ("[.ch.]" and "[=ch=]"
entirely in the Ragel parser, using the same approach both inside and
ouside character classes.
Fix buggy rejection of [^.ch.], which we should accept as a character
class.
2015-11-10 14:36:39 +11:00
Justin Viiret
d9efe07125
depth: correct sign in printf format
2015-11-10 14:36:38 +11:00
Justin Viiret
62776b615b
nfa_api_queue: debug printf format fix
2015-11-10 14:36:38 +11:00
Justin Viiret
863ea1b2b2
mpv_dump: correct hex escapes in printf format
2015-11-10 14:36:38 +11:00
Justin Viiret
51c8020039
simplegrep: use correct sign in printf format
2015-11-10 14:36:38 +11:00
Justin Viiret
ed4a3cdcf1
compare: always use braces for for/if blocks
2015-11-10 14:36:38 +11:00
Justin Viiret
fb834114e5
limex_dump: use 'override' keyword in subclass
2015-11-10 14:36:38 +11:00
Justin Viiret
5805ac193c
NGWrapper: mark dtor with override
2015-11-10 14:36:38 +11:00
Justin Viiret
4c53bd4641
parser: use 'override' keyword in subclasses
2015-11-10 14:36:38 +11:00
Justin Viiret
46ad39f253
Add inlined sparseLastTop
...
This allows the code to be inlined into other sparse optimal repeat
functions.
2015-11-10 14:36:38 +11:00
Justin Viiret
2603be3924
storeInitialRingTopPatch: fix large delta bug
...
Check for staleness up front, so that it is safe to use u32 values to
handle adding more tops.
Adds LargeGap unit tests.
2015-11-10 14:36:38 +11:00
Justin Viiret
a083bcfa8d
repeat: use u32 arithmetic explicitly
...
In some ring-based models, we know that if the ring is not stale, then
all our bounds should fit within 32-bits. This change makes these
explicitly u32 rather than implicitly narrowing later on.
2015-11-10 14:36:38 +11:00
Justin Viiret
ae7dbc2472
repeatRecurTable: no need for u64a return type
2015-11-10 14:36:38 +11:00
Xiang Wang
e8bfe5478b
Optimize max clique analysis
...
Use vectors of state ids to avoid the overhead of subgraph copies
2015-11-10 14:36:38 +11:00
Alex Coyte
1507b3fd36
move oversize graph check out of Automaton_holder ctor
2015-11-10 14:36:14 +11:00
Alex Coyte
89660e30b6
raw_som_dfa: initialize members in constructor
2015-11-10 14:36:14 +11:00
Justin Viiret
4311775b43
LimEx NFA: unify flush br/estate behaviour
...
Make the GPR NFA models only clear cached_estate conditionally based on
cached_br, as per the SIMD models.
2015-11-10 14:36:14 +11:00
Justin Viiret
b5e290e985
LimEx NFA: no need to zero estate cache in STREAM
...
We believe that we have solved the issues that required zeroing of the
exception state in STREAM_FN and REV_STREAM_FN nowadays.
2015-11-10 14:36:14 +11:00
Justin Viiret
01498fa8a5
LimEx NFA: no need to zero init cached_esucc
...
All of the "exception cache" members are guarded by cached_esucc.
2015-11-10 14:25:05 +11:00
Alex Coyte
510e999738
make Automaton_Base ctor protected
...
Makes explicit that Automaton_Base is intended to be used as a only base class
2015-11-10 14:25:05 +11:00
Alex Coyte
a255e6b678
add asserts to make bounds on alphaShift clear
2015-11-10 14:25:05 +11:00
Alex Coyte
7b6ad2a01a
doComponent: make it obvious that a is never null
2015-11-10 14:25:05 +11:00
Justin Viiret
c7bebf8836
RoseBuildImpl: init base_id members
...
These are set late in the Rose build process, when final IDs are
allocated.
2015-11-10 14:25:04 +11:00