1373 Commits

Author SHA1 Message Date
Matthew Barr
a5944067d4 Bump version number 2015-12-18 14:37:29 +11:00
Justin Viiret
0f2cbb9ffd Small updates to documentation for 4.1 2015-12-18 14:36:53 +11:00
Justin Viiret
2aa6830c88 Add ChangeLog 2015-12-18 14:36:53 +11:00
Xiang Wang
7bcd2b07c9 simplify max clique analysis 2015-12-07 09:38:33 +11:00
Justin Viiret
8c09d054c9 Add per-top findMinWidth etc for NFA graphs 2015-12-07 09:38:32 +11:00
Justin Viiret
748d46c124 CastleProto: track next top explicitly
Repeats may be removed (e.g. by pruning in role aliasing passes)
leaving "holes" in the top map. Track the next top to use explicitly,
rather than using repeats.size().
2015-12-07 09:38:32 +11:00
Justin Viiret
8427d83780 CastleProto: track mapping of reports to tops
This allows us to speed up report-based queries, like dedupe checking.
2015-12-07 09:38:32 +11:00
Justin Viiret
da23e8306a assignDkeys: use flat_set<ReportID>, not set 2015-12-07 09:38:32 +11:00
Justin Viiret
8dac64d1dc findMinWidth, findMaxWidth: width for a given top
Currently only implemented for Castle suffixes.
2015-12-07 09:38:32 +11:00
Justin Viiret
03953f34b1 RoseDedupeAuxImpl: collect unique suffixes first 2015-12-07 09:38:32 +11:00
Justin Viiret
1267922ca7 role aliasing: simplify hashRightRoleProperties
Using the full report set for a suffix as an input to this hash was very
slow at scale.
2015-12-07 09:38:32 +11:00
Justin Viiret
b87590ce9d castle: simplify find_next_top
Tops are no longer sparse in CastleProto, so the linear scan for holes
isn't necessary.
2015-12-07 09:38:32 +11:00
Justin Viiret
15c2980948 Make key 64 bits where large shifts may be used.
This fixes a long-standing issue with large multibit structures.
2015-12-07 09:38:32 +11:00
Justin Viiret
205bc1af7f PCRE includes U+180E in /[:print:]/8W 2015-12-07 09:10:12 +11:00
Justin Viiret
f9b7e806b1 Update defn of class [:punct:] for PCRE 8.38 2015-12-07 09:08:46 +11:00
Justin Viiret
25a01e1c3c Unify handling of caseless flag in class parser
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).

Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
2015-12-07 09:07:37 +11:00
Justin Viiret
bdb7a10034 Fix defn of POSIX graph, print, punct classes
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
2015-12-07 09:06:23 +11:00
Mohammad Abdul Awal
313822c157 FDR runtime simplification
Removed static specialisation of domains.
2015-11-20 14:44:43 +11:00
Justin Viiret
abbd548899 ng_execute: update interface to use flat_set
This changes all the execute_graph() interfaces so that instead of
mutating a std::set of vertices, they accept an initial flat_set of
states and return a resultant flat_set of states after execution.

(Note that internally execute_graph() still uses bitsets)

This is both faster and more flexible.
2015-11-18 15:27:17 +11:00
Justin Viiret
fd19168025 Restore \Q..\E support in character classes 2015-11-18 15:27:05 +11:00
Justin Viiret
2a2576e907 Introduce copy_bytes for writing into bytecode
Protects memcpy from nullptr sources, which triggers failures in GCC's
UB sanitizer.
2015-11-18 15:26:16 +11:00
Justin Viiret
cf3ddd9e88 repeatStoreSparseOptimalP: make diff a u32
As delta is a u32, we know diff will always fit within a u32 as well.
Silences a warning from Coverity.
2015-11-18 15:26:11 +11:00
Matthew Barr
f65170da5b cmake: improve build paths for nested builds
If Hyperscan is built as a subproject of another cmake project, it helps to
refer to PROJECT_xx_DIR instead of CMAKE_xx_DIR, etc.
2015-11-10 14:36:39 +11:00
Matthew Barr
b9d3b73ab8 Fix includes to meet our usual guidelines 2015-11-10 14:36:39 +11:00
Justin Viiret
9cffa7666f Refine ComponentClass::class_empty
ComponentClass::class_empty should only be used on finalized classes to
determine whether a given class contains any elements; it should not
take the cr_ucp or cps_ucp into account, as they have been folden in by
the finalize call.

Fixes our failure to identify that the pattern /[^\D\d]/8W can never
match.
2015-11-10 14:36:39 +11:00
Justin Viiret
c68bfe05d8 Don't use class_empty in early class parsing
Instead, explicitly track whether we're still in the early class parsing
machine.
2015-11-10 14:36:39 +11:00
Justin Viiret
b1f6a539c7 Remove dead ComponentClass::{get,set}FirstChar 2015-11-10 14:36:39 +11:00
Justin Viiret
9a7b912a5d Rework parser rejection for POSIX collating elems
Implement rejection of POSIX collating elements ("[.ch.]" and "[=ch=]"
entirely in the Ragel parser, using the same approach both inside and
ouside character classes.

Fix buggy rejection of [^.ch.], which we should accept as a character
class.
2015-11-10 14:36:39 +11:00
Justin Viiret
d9efe07125 depth: correct sign in printf format 2015-11-10 14:36:38 +11:00
Justin Viiret
62776b615b nfa_api_queue: debug printf format fix 2015-11-10 14:36:38 +11:00
Justin Viiret
863ea1b2b2 mpv_dump: correct hex escapes in printf format 2015-11-10 14:36:38 +11:00
Justin Viiret
51c8020039 simplegrep: use correct sign in printf format 2015-11-10 14:36:38 +11:00
Justin Viiret
ed4a3cdcf1 compare: always use braces for for/if blocks 2015-11-10 14:36:38 +11:00
Justin Viiret
fb834114e5 limex_dump: use 'override' keyword in subclass 2015-11-10 14:36:38 +11:00
Justin Viiret
5805ac193c NGWrapper: mark dtor with override 2015-11-10 14:36:38 +11:00
Justin Viiret
4c53bd4641 parser: use 'override' keyword in subclasses 2015-11-10 14:36:38 +11:00
Justin Viiret
46ad39f253 Add inlined sparseLastTop
This allows the code to be inlined into other sparse optimal repeat
functions.
2015-11-10 14:36:38 +11:00
Justin Viiret
2603be3924 storeInitialRingTopPatch: fix large delta bug
Check for staleness up front, so that it is safe to use u32 values to
handle adding more tops.

Adds LargeGap unit tests.
2015-11-10 14:36:38 +11:00
Justin Viiret
a083bcfa8d repeat: use u32 arithmetic explicitly
In some ring-based models, we know that if the ring is not stale, then
all our bounds should fit within 32-bits. This change makes these
explicitly u32 rather than implicitly narrowing later on.
2015-11-10 14:36:38 +11:00
Justin Viiret
ae7dbc2472 repeatRecurTable: no need for u64a return type 2015-11-10 14:36:38 +11:00
Xiang Wang
e8bfe5478b Optimize max clique analysis
Use vectors of state ids to avoid the overhead of subgraph copies
2015-11-10 14:36:38 +11:00
Alex Coyte
1507b3fd36 move oversize graph check out of Automaton_holder ctor 2015-11-10 14:36:14 +11:00
Alex Coyte
89660e30b6 raw_som_dfa: initialize members in constructor 2015-11-10 14:36:14 +11:00
Justin Viiret
4311775b43 LimEx NFA: unify flush br/estate behaviour
Make the GPR NFA models only clear cached_estate conditionally based on
cached_br, as per the SIMD models.
2015-11-10 14:36:14 +11:00
Justin Viiret
b5e290e985 LimEx NFA: no need to zero estate cache in STREAM
We believe that we have solved the issues that required zeroing of the
exception state in STREAM_FN and REV_STREAM_FN nowadays.
2015-11-10 14:36:14 +11:00
Justin Viiret
01498fa8a5 LimEx NFA: no need to zero init cached_esucc
All of the "exception cache" members are guarded by cached_esucc.
2015-11-10 14:25:05 +11:00
Alex Coyte
510e999738 make Automaton_Base ctor protected
Makes explicit that Automaton_Base is intended to be used as a only base class
2015-11-10 14:25:05 +11:00
Alex Coyte
a255e6b678 add asserts to make bounds on alphaShift clear 2015-11-10 14:25:05 +11:00
Alex Coyte
7b6ad2a01a doComponent: make it obvious that a is never null 2015-11-10 14:25:05 +11:00
Justin Viiret
c7bebf8836 RoseBuildImpl: init base_id members
These are set late in the Rose build process, when final IDs are
allocated.
2015-11-10 14:25:04 +11:00