65 Commits

Author SHA1 Message Date
Justin Viiret
8dac64d1dc findMinWidth, findMaxWidth: width for a given top
Currently only implemented for Castle suffixes.
2015-12-07 09:38:32 +11:00
Justin Viiret
03953f34b1 RoseDedupeAuxImpl: collect unique suffixes first 2015-12-07 09:38:32 +11:00
Justin Viiret
1267922ca7 role aliasing: simplify hashRightRoleProperties
Using the full report set for a suffix as an input to this hash was very
slow at scale.
2015-12-07 09:38:32 +11:00
Justin Viiret
b87590ce9d castle: simplify find_next_top
Tops are no longer sparse in CastleProto, so the linear scan for holes
isn't necessary.
2015-12-07 09:38:32 +11:00
Justin Viiret
15c2980948 Make key 64 bits where large shifts may be used.
This fixes a long-standing issue with large multibit structures.
2015-12-07 09:38:32 +11:00
Justin Viiret
205bc1af7f PCRE includes U+180E in /[:print:]/8W 2015-12-07 09:10:12 +11:00
Justin Viiret
f9b7e806b1 Update defn of class [:punct:] for PCRE 8.38 2015-12-07 09:08:46 +11:00
Justin Viiret
25a01e1c3c Unify handling of caseless flag in class parser
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).

Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
2015-12-07 09:07:37 +11:00
Justin Viiret
bdb7a10034 Fix defn of POSIX graph, print, punct classes
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
2015-12-07 09:06:23 +11:00
Mohammad Abdul Awal
313822c157 FDR runtime simplification
Removed static specialisation of domains.
2015-11-20 14:44:43 +11:00
Justin Viiret
abbd548899 ng_execute: update interface to use flat_set
This changes all the execute_graph() interfaces so that instead of
mutating a std::set of vertices, they accept an initial flat_set of
states and return a resultant flat_set of states after execution.

(Note that internally execute_graph() still uses bitsets)

This is both faster and more flexible.
2015-11-18 15:27:17 +11:00
Justin Viiret
fd19168025 Restore \Q..\E support in character classes 2015-11-18 15:27:05 +11:00
Justin Viiret
2a2576e907 Introduce copy_bytes for writing into bytecode
Protects memcpy from nullptr sources, which triggers failures in GCC's
UB sanitizer.
2015-11-18 15:26:16 +11:00
Justin Viiret
cf3ddd9e88 repeatStoreSparseOptimalP: make diff a u32
As delta is a u32, we know diff will always fit within a u32 as well.
Silences a warning from Coverity.
2015-11-18 15:26:11 +11:00
Matthew Barr
f65170da5b cmake: improve build paths for nested builds
If Hyperscan is built as a subproject of another cmake project, it helps to
refer to PROJECT_xx_DIR instead of CMAKE_xx_DIR, etc.
2015-11-10 14:36:39 +11:00
Matthew Barr
b9d3b73ab8 Fix includes to meet our usual guidelines 2015-11-10 14:36:39 +11:00
Justin Viiret
9cffa7666f Refine ComponentClass::class_empty
ComponentClass::class_empty should only be used on finalized classes to
determine whether a given class contains any elements; it should not
take the cr_ucp or cps_ucp into account, as they have been folden in by
the finalize call.

Fixes our failure to identify that the pattern /[^\D\d]/8W can never
match.
2015-11-10 14:36:39 +11:00
Justin Viiret
c68bfe05d8 Don't use class_empty in early class parsing
Instead, explicitly track whether we're still in the early class parsing
machine.
2015-11-10 14:36:39 +11:00
Justin Viiret
b1f6a539c7 Remove dead ComponentClass::{get,set}FirstChar 2015-11-10 14:36:39 +11:00
Justin Viiret
9a7b912a5d Rework parser rejection for POSIX collating elems
Implement rejection of POSIX collating elements ("[.ch.]" and "[=ch=]"
entirely in the Ragel parser, using the same approach both inside and
ouside character classes.

Fix buggy rejection of [^.ch.], which we should accept as a character
class.
2015-11-10 14:36:39 +11:00
Justin Viiret
d9efe07125 depth: correct sign in printf format 2015-11-10 14:36:38 +11:00
Justin Viiret
62776b615b nfa_api_queue: debug printf format fix 2015-11-10 14:36:38 +11:00
Justin Viiret
863ea1b2b2 mpv_dump: correct hex escapes in printf format 2015-11-10 14:36:38 +11:00
Justin Viiret
51c8020039 simplegrep: use correct sign in printf format 2015-11-10 14:36:38 +11:00
Justin Viiret
ed4a3cdcf1 compare: always use braces for for/if blocks 2015-11-10 14:36:38 +11:00
Justin Viiret
fb834114e5 limex_dump: use 'override' keyword in subclass 2015-11-10 14:36:38 +11:00
Justin Viiret
5805ac193c NGWrapper: mark dtor with override 2015-11-10 14:36:38 +11:00
Justin Viiret
4c53bd4641 parser: use 'override' keyword in subclasses 2015-11-10 14:36:38 +11:00
Justin Viiret
46ad39f253 Add inlined sparseLastTop
This allows the code to be inlined into other sparse optimal repeat
functions.
2015-11-10 14:36:38 +11:00
Justin Viiret
2603be3924 storeInitialRingTopPatch: fix large delta bug
Check for staleness up front, so that it is safe to use u32 values to
handle adding more tops.

Adds LargeGap unit tests.
2015-11-10 14:36:38 +11:00
Justin Viiret
a083bcfa8d repeat: use u32 arithmetic explicitly
In some ring-based models, we know that if the ring is not stale, then
all our bounds should fit within 32-bits. This change makes these
explicitly u32 rather than implicitly narrowing later on.
2015-11-10 14:36:38 +11:00
Justin Viiret
ae7dbc2472 repeatRecurTable: no need for u64a return type 2015-11-10 14:36:38 +11:00
Xiang Wang
e8bfe5478b Optimize max clique analysis
Use vectors of state ids to avoid the overhead of subgraph copies
2015-11-10 14:36:38 +11:00
Alex Coyte
1507b3fd36 move oversize graph check out of Automaton_holder ctor 2015-11-10 14:36:14 +11:00
Alex Coyte
89660e30b6 raw_som_dfa: initialize members in constructor 2015-11-10 14:36:14 +11:00
Justin Viiret
4311775b43 LimEx NFA: unify flush br/estate behaviour
Make the GPR NFA models only clear cached_estate conditionally based on
cached_br, as per the SIMD models.
2015-11-10 14:36:14 +11:00
Justin Viiret
b5e290e985 LimEx NFA: no need to zero estate cache in STREAM
We believe that we have solved the issues that required zeroing of the
exception state in STREAM_FN and REV_STREAM_FN nowadays.
2015-11-10 14:36:14 +11:00
Justin Viiret
01498fa8a5 LimEx NFA: no need to zero init cached_esucc
All of the "exception cache" members are guarded by cached_esucc.
2015-11-10 14:25:05 +11:00
Alex Coyte
510e999738 make Automaton_Base ctor protected
Makes explicit that Automaton_Base is intended to be used as a only base class
2015-11-10 14:25:05 +11:00
Alex Coyte
a255e6b678 add asserts to make bounds on alphaShift clear 2015-11-10 14:25:05 +11:00
Alex Coyte
7b6ad2a01a doComponent: make it obvious that a is never null 2015-11-10 14:25:05 +11:00
Justin Viiret
c7bebf8836 RoseBuildImpl: init base_id members
These are set late in the Rose build process, when final IDs are
allocated.
2015-11-10 14:25:04 +11:00
Justin Viiret
447753f148 FDR compiler: assert that all models are < 32 bits 2015-11-10 14:25:04 +11:00
Justin Viiret
da2386585d Init filter members to nullptr
Note that BGL filters must be default-constructible.
2015-11-10 14:25:04 +11:00
Justin Viiret
cea914e18e Add q_last_type() queue function
Analogous to q_cur_type(), asserts that queue indices are within a valid
range.
2015-11-10 14:25:04 +11:00
Justin Viiret
a6383a54a4 assignStringsToBuckets: assert that there are lits 2015-11-10 14:25:04 +11:00
Matthew Barr
fe31630221 Merge develop into master v4.0.1 2015-10-30 11:29:20 +11:00
Matthew Barr
91343b00e9 Bump version number 2015-10-30 11:28:38 +11:00
Matthew Barr
aa674e4e47 unit: Don't run unit-internal in release build 2015-10-30 11:28:38 +11:00
Matthew Barr
1f47b82106 Remove unneeded code at preproc stage
If we know we have BMI2 we shouldn't produce the fallback code.
2015-10-30 11:28:38 +11:00