Compare commits

...

898 Commits

Author SHA1 Message Date
ypicchi-arm
9e9a10ad01
Fix double shufti's vector end false positive (#325)
* Add regression test for double shufti

It tests for false positive at vector edges.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

* Fix double shufti reporting false positives

Double shufti used to offset one vector, resulting in losing one character
at the end of every vector. This was replaced by a magic value indicating a
match. This meant that if the first char of a pattern fell on the last char of
a vector, double shufti would assume the second character is present and
report a match.
This patch fixes it by keeping the previous vector and feeding its data to the
new one when we shift it, preventing any loss of data.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

* vshl() will call the correct implementation

* implement missing vshr_512_imm(), simplifies caller x86 code

* Fix x86 case, use alignr instead

* it's the reverse, the avx512 alignr is incorrect, need to fix

* Make shufti's OR reduce size agnostic

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

* Fix test's array size

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

* Fix AVX2/AVX512 alignr implementations and unit tests

* Fix Power VSX alignr

---------

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
Co-authored-by: Konstantinos Margaritis <konstantinos@vectorcamp.gr>
2025-06-11 18:55:10 +03:00
Chrysovalantis - Michail Liakopoulos
c057c7f0f0
Fix/fbsd gcc13 error (#338)
* added static libraries in cmake to fix unit-internal seg fault in freebsd, ppc64le, gcc13 error
* Moved gcc13 flags for freebsd-gcc13 in cmake/cflags-ppc64le.make
2025-05-31 11:46:22 +03:00
Konstantinos Margaritis
f7d5546fe5
Bugfix/fix avx512vbmi regressions (#335)
Multiple AVX512VBMI-related fixes:

src/nfa/mcsheng_compile.cpp: No need for an assert here, impl_id can be set to 0
src/nfa/nfa_api_queue.h: Make sure this compiles on both C++ and C
src/nfagraph/ng_fuzzy.cpp: Fix compilation error when DEBUG_OUTPUT=on
src/runtime.c: Fix crash when data == NULL
unit/internal/sheng.cpp: Unit test has to enable AVX512VBMI manually as autodetection does not get trigger, this causes test to fail
src/fdr/teddy_fat.cpp: AVX512 loads need to be 64-bit aligned, caused a crash on clang-18
2025-05-30 21:08:55 +03:00
Konstantinos Margaritis
689556d5f9
Various cppcheck fixes (#337) 2025-05-30 12:27:28 +03:00
Rafał Dowgird
d90ab3ac1c
Fixed out of bounds read in AVX512VBMI version of fdr_exec_fat_teddy … (#333)
Fixed out of bounds read in AVX512VBMI version of fdr_exec_fat_teddy (#322)

  * Replaced the 32 byte read with a properly truncated mapped read
  * Added a unit test

Co-authored-by: Rafał Dowgird <rafal.dowgird@rtbhouse.com>
2025-05-18 11:01:10 +03:00
Konstantinos Margaritis
7e0503c3b8
partial_load_u64 will fail if buf == NULL/c_len == 0 (#331) 2025-05-16 13:44:36 +03:00
Konstantinos Margaritis
5e62255667
Clang 17+ is more restrictive on rebind<T> on MacOS/Boost, remove warning (#332)
* Clang 17+ is more restrictive on rebind<T> on MacOS/Boost, remove warning

* More clang/boost warnings on MacOS, disable for now
2025-05-16 13:44:20 +03:00
ypicchi-arm
55a05e41a0
Fix 5.4.11's config step regression (#327)
An old commit (24ae1670d) had the side effect of moving cmake defines after
they were being used. This patch move them back to be defined before being used.
Speed hsbench back up by ~ 0.8%

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2025-05-15 00:58:01 +03:00
HelixHexagon
4951b6186d
Fix typo in build instructions (#315) 2025-05-14 23:49:07 +03:00
gtsoul-tech
4f09e785c0
Fix regression error #317 and add unit test (#318)
Revert the code that produced the regression error in #317 
Add the regression error to a unit test regressions.cpp along with the rebar tests

---------

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-11-13 10:43:23 +02:00
gtsoul-tech
9a3268b047
Cppcheck errors fixed and suppressed (#319)
* supress knownConditionTrueFalse

* cppcheck suppress redundantInitialization

* cppcheck solve stlcstrStream

* cppcheck suppress useStlAlgorithm

* cppcheck-suppress derefInvalidIteratorRedundantCheck

* cppcheck solvwe constParameterReference

* const parameter reference cppcheck

* removed wrong fix

* cppcheck-suppress memsetClassFloat

* cppcheck fix memsetClassFloat

* cppcheck fix unsignedLessThanZero

* supressing all errors on simde gitmodule

* fix typo (unsignedLessThanZero)

* fix cppcheck suppress simde gitmodule

* cppcheck-suppress unsignedLessThanZero

---------

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-11-12 10:01:11 +02:00
ypicchi-arm
5145b6d2ab
Fix noodle sve2 off by one (#313)
* Revert "Fix noodle SVE2 off by one bug"

This patch was fixing the bug when it happens at the end of the buffer
but it wasn't fixing it when we do scanDoubleOnce before the main loop

The next patch fix this bug for both case instead

This reverts commit 48dd0e5ff0bc1995d62461c92cfb76d44d1d0105.

* Fix noodle spurious match with \0 chars for SVE2

When sve2's noodle process a non full vector (before the main loop or
at the end of it), a fake \0 was being parsed, trigerring a match for
pattern that ended with \0. This patch fix this.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

---------

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-08-29 13:49:29 +03:00
Michael Tremer
de15179c57
hs_valid_platform: Fix check for SSE4.2 (#310)
Vectorscan requires SSE4.2 as a minimum on x86_64. For Hyperscan this
used to be SSSE3.

Applications that use the library call hs_valid_platform() to check if
the CPU fulfils this minimum requirement. However, when Vectorscan
upgraded to SSE4.2, the check was not updated. This leads to the library
trying to execute instructions that are not supported, resulting in the
application to crash.

This might not have been noticed as the CPUs that do not support SSE4.2
are rather old and unlikely to run any load where performance is an
issue. However, I believe that the library should not let the
application crash.

Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
2024-08-22 10:34:05 +03:00
ypicchi-arm
e4c49f2aa2
Make vectorscan accept \0 starting pattern (#312)
Vectorscan used to reject such pattern because they were being compared
to "" and found to be an empty string. We now check the pattern length
instead.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-08-22 10:32:53 +03:00
ypicchi-arm
aa4bc24439
Fix noodle SVE2 off by one bug (#309)
By using svmatch on 16 bit lanes with a 8 bit predicate, we end up
including an undefined character in the pattern checks. The inactive
lane after load contains an undefined value, usually \0. Patterns
using \0 as the last character would then match this spurious
character, returning a match beyond the buffer's end. The fix checks
for such matches and rejects them.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-08-05 09:42:56 +03:00
gtsoul-tech
6c8e33e597
Bug fix/rebar tests (#307)
* fixed paths and utf8-lossy=true

* revert to maskz (its the bug)

* cppcheck fix

---------

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-07-29 11:49:25 +03:00
gtsoul-tech
1dc0600156
Rebar based Unit tests (#305)
* rebar based unit tests

* fixing paths

---------

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-07-24 10:39:24 +03:00
g. economou
dd43c86658
maybe fix the hsbench issue (check_ssse3 again) in sse2/simde env (#306)
* maybe fix the hsbench issue (check_ssse3 again) in sse2/simde env

* fix the last failing unit test with fat

---------

Co-authored-by: G.E. <gregory.economou@vectorcamp.gr>
2024-07-12 15:23:07 +03:00
g. economou
cc11a3d738
build/run on machines that only have SSE2 with SIMDE (#303)
This allows the use of SIMDE library to emulate SSSE3/SSE4.2 instructions on SSE2-only (x86-64-v2) hardware.

---------

Co-authored-by: G.E <gregory.economou@vectorcamp.gr>
Co-authored-by: Konstantinos Margaritis <konstantinos@vectorcamp.gr>
2024-07-10 21:20:17 +03:00
g. economou
aa832db892
Teddy macros unrolling - initial PR to test in CI (#294)
Major refactoring of teddy and teddy_avx2, unrolling macros to C++ templated functions

---------

Co-authored-by: G.E <gregory.economou@vectorcamp.gr>
2024-06-26 22:35:33 +03:00
gtsoul-tech
0f4369bf22
Bug fix/clang-tidy-performance (#300)
Various clang-tidy-performance fixes:
* noexcept
* performance-noexcept-swap
* performance
* performance-move-const-arg
* performance-unnecessary-value-param
* performance-inefficient-vector-operation
* performance-no-int-to-ptr
* add performance
* performance-inefficient-string-concatenation
* clang-analyzer-deadcode.DeadStores
* performance-inefficient-vector-operation
* clang-analyzer-core.NullDereference
* clang-analyzer-core.UndefinedBinaryOperatorResult
* clang-analyzer-core.CallAndMessage

---------

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-06-20 14:57:19 +03:00
gtsoul-tech
0e0c9f8c63
Script for the clang-tidy CI (#299)
script to clang-tidy CI

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-06-11 15:05:52 +03:00
gtsoul-tech
a68845c82b
Bug fix/clang tidy warnings part3 (#298)
* clang-analyzer-deadcode.DeadStores

* clang-analyzer-optin.performance.Padding

---------

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-06-10 10:08:54 +03:00
gtsoul-tech
834a329daa
Clang-tidy config (#297)
clang-tidy config

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-06-07 17:10:32 +03:00
gtsoul-tech
8fc1a7efff
Bug fix/clang tidy warnings part2 (#296)
* core.StackAddressEscape

* cplusplus.VirtualCall

* clang-analyzer-deadcode.DeadStores

* clang-analyzer-core.NullDereference

* clang-analyzer-core.NonNullParamChecker

* change to nolint

---------

Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>
2024-06-04 16:18:17 +03:00
Konstantinos Margaritis
4113a1f150
Fix Clang Tidy warnings (#295)
Fixes some of the clang-tidy warnings
clang-analyzer-deadcode.DeadStores
clang-analyzer-cplusplus.NewDelete
clang-analyzer-core.uninitialized.UndefReturn

closes some:#253

ignored in this pr:
/usr/include/boost/smart_ptr/detail/shared_count.hpp:432:24
/usr/include/boost/smart_ptr/detail/shared_count.hpp:443:24
51 in build/src/parser
gtest ones
src/fdr/teddy_compile.cpp:600:5 refactoring on way
src/fdr/fdr_compile.cpp:209:5 refactoring on way
2024-05-31 18:23:16 +03:00
gtsoul-tech
9987ecd4a0 clang-analyzer-cplusplus.Move 2024-05-31 10:34:55 +03:00
gtsoul-tech
00a5ff1c67 clang-analyzer-core.uninitialized.UndefReturn 2024-05-31 10:24:44 +03:00
gtsoul-tech
e36203c323 remove comment 2024-05-31 09:47:45 +03:00
Konstantinos Margaritis
85ffb2b2f1
Fix Clang Tidy warning optin.performance.Padding (#293)
Fixes some optin.performance.Padding

closes some: #253
2024-05-30 17:24:14 +03:00
gtsoul-tech
c5c4c5d5f5 clang-analyzer-cplusplus.NewDelete 2024-05-30 16:40:55 +03:00
gtsoul-tech
9c0beb57f8 deadcode.DeadStores 2024-05-30 16:40:47 +03:00
gtsoul-tech
de1697b467 deadcode.DeadStores 2024-05-30 16:40:18 +03:00
gtsoul-tech
faa9e7549f uninitialized.UndefReturn 2024-05-29 11:51:06 +03:00
gtsoul-tech
d23e4a12e7 cplusplus.Move 2024-05-28 14:15:03 +03:00
gtsoul-tech
aa6acaec84 optin.performance.Padding 2024-05-27 15:41:57 +03:00
Konstantinos Margaritis
c837925087
Fix/Suppress remaining Cppcheck warnings (#291)
Fix/suppress the following cppcheck warnings:

* arithOperationsOnVoidPointer
* uninitMember
* const*
* shadowVariable
* assignmentIntegerToAddress
* containerOutOfBounds
* pointer-related warnings in Ragel source
* missingOverride
* memleak
* knownConditionTrueFalse
* noExplicitConstructor
* invalidPrintfArgType_sint
* useStlAlgorithm
* cstyleCast
* clarifyCondition
* VSX-related cstyleCast
* unsignedLessThanZero 

Furthermore, we added a suppression list to be used, which also includes the following:
* missingIncludeSystem
* missingInclude
* unmatchedSuppression
2024-05-27 12:23:02 +03:00
Konstantinos Margaritis
cebc6541c1
Part 5 of C-style cast cppcheck (#289)
Fixes some cstyleCasts part 5

closes some: #252
2024-05-24 23:24:58 +03:00
gtsoul-tech
16467faf2b revert uniform cstyle suppress 2024-05-24 10:12:15 +03:00
Konstantinos Margaritis
0e271ccf9a
Speed up truffle with 256b TBL instructions (#290)
256b wide SVE vectors allow some simplification of truffle. Up to 40%
speedup on graviton3. Going from 12500 MB/s to 17000 MB/s onhe
microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE

Add unit tests and benchmark for this wide variant
2024-05-23 09:38:24 +03:00
Yoan Picchi
938c026256 Speed up truffle with 256b TBL instructions
256b wide SVE vectors allow some simplification of truffle.
Up to 40% speedup on graviton3. Going from 12500 MB/s to 17000 MB/s
onhe microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE

Add unit tests and benchmark for this wide variant

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-05-22 16:13:53 +00:00
gtsoul-tech
af39f77461 cstylecast parser 2024-05-22 11:11:13 +03:00
gtsoul-tech
94eff4aa60 cstylecasts and suppressions 2024-05-22 10:16:56 +03:00
Konstantinos Margaritis
b312112e87
Merge pull request #288 from isildur-g/bugfix-assert
revert a change to assert
2024-05-21 23:43:04 +03:00
Konstantinos Margaritis
fd46b72a18
Merge pull request #287 from gtsoul-tech/bugFix/cppcheck-cStylecasts-Part4
Part 4 of C-style cast cppcheck
2024-05-21 15:59:51 +03:00
Konstantinos Margaritis
2ec64b6f07
Merge pull request #283 from isildur-g/wip-cppcheck271-part2
Wip cppcheck271 useStlAlgorithm part2
2024-05-21 15:52:15 +03:00
G.E.
82067fd526 revert a change to assert , the original logic might have been
subtely clever (or else totally useless all these years), when we
see which of the two we might delete that assert entirely. for now
put it back as it was.
2024-05-20 18:03:56 +03:00
gtsoul-tech
dfa72ffd50 cstylecasts suppress,fixes 2024-05-20 17:09:30 +03:00
G.E
7a5f271abe undo that one, it breaks 2024-05-20 16:35:58 +03:00
Konstantinos Margaritis
cd39f71e80
Merge pull request #286 from VectorCamp/bugfix/cppcheck-fix-wrong-casts
Fix remaining C style casts and suppressions
2024-05-20 15:17:01 +03:00
gtsoul-tech
e111684bc2 fix cStyleCasts 2024-05-20 14:54:35 +03:00
Konstantinos Margaritis
dc18a14663 Fix typo 2024-05-18 22:52:17 +03:00
Konstantinos Margaritis
40da067b4f Add more C style casts fixes and suppressions 2024-05-18 21:49:54 +03:00
Konstantinos Margaritis
92d5db503e remove unneeded suppression 2024-05-18 21:47:47 +03:00
Konstantinos Margaritis
28adc07824 Fix more C-style casts 2024-05-18 15:10:34 +03:00
Konstantinos Margaritis
06339e65ad Fix casts 2024-05-18 14:58:11 +03:00
Konstantinos Margaritis
14deb313c1 Fix wrong comparison 2024-05-18 14:57:44 +03:00
Konstantinos Margaritis
80812ee5b3 Fix wrong casts in pcapscan.cc 2024-05-18 14:56:30 +03:00
Konstantinos Margaritis
a8373df48b
Merge pull request #285 from gtsoul-tech/bugFix/cppcheck-cStylecasts-Part3
Part 3 of C-style cast cppcheck
2024-05-18 09:43:25 +03:00
Konstantinos Margaritis
cffe5095da
Merge pull request #284 from gtsoul-tech/bigFix/cppcheck-cStylecasts-Part2
Part 2 of C-style cast cppcheck
2024-05-18 09:41:50 +03:00
G.E
fa358535be stl'ed another. 2024-05-18 00:02:43 +03:00
G.E
97a8519084 stl'ed another one 2024-05-17 23:27:42 +03:00
G.E
84dd8de656 stl'ed one more instance 2024-05-17 23:18:55 +03:00
gtsoul-tech
e261f286da cStyleCasts 2024-05-17 16:58:08 +03:00
gtsoul-tech
2fa06dd9ed cStyleCasts 2024-05-17 13:57:12 +03:00
G.E
3b01effaf7 clean up comments 2024-05-17 11:27:43 +03:00
g. economou
22c3e3da6e
Merge branch 'develop' into wip-cppcheck271-part2 2024-05-17 11:08:09 +03:00
G.E
c038f776b1 last batch 2024-05-17 11:02:00 +03:00
G.E
f2cecfd0e2 next batch 2024-05-17 10:44:28 +03:00
G.E
c482c05fa8 next batch 2024-05-17 00:08:37 +03:00
Konstantinos Margaritis
4abe31cbce
Merge pull request #281 from VectorCamp/bugfix/cppcheck-cstyleCasts-part1
Part 1 of C-style cast cppcheck fixes
2024-05-16 23:14:45 +03:00
G.E
da4f563a24 first batch of cppcheck disables and a few more stl-ifications,
involving use of accumulate() .
2024-05-16 23:01:17 +03:00
Konstantinos Margaritis
59b4e082a8 add alternative macro without C casts to avoid Cppcheck warnings 2024-05-16 15:58:02 +03:00
Konstantinos Margaritis
1290733c89 Fix C style casts in mcsheng_compile.cpp 2024-05-16 15:57:39 +03:00
Konstantinos Margaritis
f9df732284
Merge pull request #280 from isildur-g/bugfix-rose-segfault
the segfault we ran into, included_frag_id should be
2024-05-16 15:41:22 +03:00
Konstantinos Margaritis
8339534a44 Fix wrong cast in aligned_free() 2024-05-16 13:06:28 +03:00
Konstantinos Margaritis
e819cb1100 Fix C-style casts 2024-05-16 12:03:42 +03:00
g. economou
b03699fade
Merge branch 'develop' into bugfix-rose-segfault 2024-05-16 09:57:58 +03:00
G.E
d653822a82 the segfault we ran into, included_frag_id should be
included_delay_frag_id.
2024-05-16 09:53:03 +03:00
Konstantinos Margaritis
db92a42681
Merge pull request #279 from VectorCamp/bugfix/cppcheck-unreadVariable-others
Fix marked as done cppcheck warnings unreadVariable & others
2024-05-15 23:18:02 +03:00
Konstantinos Margaritis
6970f8f7f4
Merge pull request #278 from VectorCamp/bugfix/cppcheck-remaining-const-warnings
Fix remaining marked as done const* cppcheck warnings
2024-05-15 17:54:27 +03:00
Konstantinos Margaritis
59a098504e remove unused variables 2024-05-15 17:18:53 +03:00
Konstantinos Margaritis
8260d7c906 another duplicateExpression false positive 2024-05-15 17:11:07 +03:00
Konstantinos Margaritis
6d6d4e1013 Fix unreadVariable warning 2024-05-15 17:05:50 +03:00
Konstantinos Margaritis
0cf72ef474 Fix missingOverride 2024-05-15 17:05:22 +03:00
Konstantinos Margaritis
4a56b9c2e9 Fix unknownMacro 2024-05-15 17:04:50 +03:00
Konstantinos Margaritis
1d4a2b2b60 Fix variableScope 2024-05-15 17:04:11 +03:00
Konstantinos Margaritis
2dc4da7f2e False positive knownConditionTrueFalse 2024-05-15 17:01:02 +03:00
Konstantinos Margaritis
9577fdc474 False positives duplicateExpression 2024-05-15 17:00:28 +03:00
Konstantinos Margaritis
22166ed948 Fix remaining marked as done const* cppcheck warnings 2024-05-15 10:52:31 +03:00
Konstantinos Margaritis
a255600773
Merge pull request #277 from isildur-g/wip-cppcheck271
phase 1 of addressing cppcheck useStlAlgorithm warnings for fill and copy operations
2024-05-15 10:44:15 +03:00
Konstantinos Margaritis
bf794fafa0
Merge pull request #276 from gtsoul-tech/bugFix/cppcheck-noExplicitConstructor-part2
Bug fix/cppcheck no explicit constructor part2
2024-05-14 19:52:01 +03:00
G.E
4cefba5ced phase 1 of addressing cppcheck useStlAlgorithm warnings,
this set only includes fill and copy operations.
2024-05-14 17:37:38 +03:00
gtsoul-tech
f8c576db15 supervector conversion 2024-05-14 14:15:15 +03:00
gtsoul-tech
3aa9c18e34 forgot some conversions SuperVector<32> 2024-05-14 14:10:56 +03:00
gtsoul-tech
0258606df3 explicit constructor Supervector 2024-05-14 13:32:50 +03:00
gtsoul-tech
9070447260 ppc64el supervector explicit constructor 2024-05-14 10:11:52 +03:00
gtsoul-tech
3d60d4f3be arm supervector explicit constructor 2024-05-14 09:53:08 +03:00
gtsoul-tech
ee8bc59ee0 x86 explicit constructor supervector 2024-05-14 09:28:13 +03:00
Konstantinos Margaritis
96aca187bd
Merge pull request #274 from gtsoul-tech/bugFix/cppcheckError-noexplicitConstructor
Bug fix/cppcheck error noexplicitconstructor
2024-05-13 21:52:55 +03:00
gtsoul-tech
9798b57f9e most ptr.get() conversion 2024-05-13 14:24:16 +03:00
gtsoul-tech
fc272868b8 fixes left_id and suffix_id 2024-05-13 13:04:57 +03:00
gtsoul-tech
0b8e863282 more fixes 2024-05-13 12:41:40 +03:00
gtsoul-tech
6830ce21ef fix conversions 2024-05-13 11:03:27 +03:00
Konstantinos Margaritis
d2d8a0bbde
Merge pull request #267 from VectorCamp/bugfix/cppcheck-rose_build_scatter-uselessAssignmentArg
Fix marked as false positive uselessAssignmentArg cppcheck warning
2024-05-13 10:42:23 +03:00
Konstantinos Margaritis
3a2ec524bd
Merge pull request #268 from VectorCamp/bugfix/cppcheck-constStatement
Fix marked as false positive constStatement cppcheck warnings
2024-05-13 10:26:38 +03:00
Konstantinos Margaritis
017acc4265
Merge pull request #269 from VectorCamp/bugfix/cppcheck-duplicateAssignment
Fix marked as false positive duplicateAssignExpression cppcheck warnings
2024-05-13 10:26:25 +03:00
Konstantinos Margaritis
98bd92ddbc
Merge pull request #270 from VectorCamp/bugfix/cppcheck-redundantAssignment
Fix marked as false positive redundantAssignment cppcheck warnings
2024-05-13 10:25:53 +03:00
Konstantinos Margaritis
77cdcec359
Merge pull request #271 from VectorCamp/bugfix/cppcheck-identicalConditionAfterEarlyExit
Fix marked as false positive identicalConditionAfterEarlyExit cppcheck warnings
2024-05-13 10:25:34 +03:00
Konstantinos Margaritis
cbd831e92a
Merge pull request #272 from VectorCamp/bugfix/cppcheck-truncLongCastAssignment
Fix marked as false positive truncLongCastAssignment cppcheck warnings
2024-05-13 10:13:45 +03:00
Konstantinos Margaritis
86645e9bee
Merge pull request #273 from VectorCamp/bugfix/cppcheck-knownConditionTrueFalse
Fix marked as false positive knownConditionTrueFalse cppcheck warnings
2024-05-13 10:13:22 +03:00
gtsoul-tech
6989314295 revert supervector 2024-05-13 09:52:42 +03:00
gtsoul-tech
2fa7eaded4 fix 2024-05-13 09:29:40 +03:00
gtsoul-tech
5affdf3a11 Merge branch 'develop' into bugFix/cppcheckError-noexplicitConstructor 2024-05-13 09:13:28 +03:00
Konstantinos Margaritis
879cc0a183 nodes is never empty at this stage, emplace_back() is called just previously 2024-05-12 20:25:29 +03:00
Konstantinos Margaritis
a1fbe84660 Fix marked as false positive knownConditionTrueFalse cppcheck warnings
std::make_shared<> does not return null, it throws std::bad_alloc.
2024-05-12 20:24:00 +03:00
Konstantinos Margaritis
c54acf0e04 Fix false positive truncLongCastAssignment warnings 2024-05-12 17:22:12 +03:00
Konstantinos Margaritis
05d86f0c3e Fix false positive /identicalConditionAfterEarlyExit warnings 2024-05-12 00:05:45 +03:00
Konstantinos Margaritis
5f2f343b8f Fix false positive redundantAssignment warnings 2024-05-11 23:45:01 +03:00
Konstantinos Margaritis
9ee4c07964 Fix false positive duplicateAssignExpression warnings 2024-05-11 23:13:45 +03:00
Konstantinos Margaritis
e6595c72aa Fix false positive constStatement warnings 2024-05-11 23:11:29 +03:00
Konstantinos Margaritis
a2e66e31d2 replace memcpy with std::copy 2024-05-11 18:47:00 +03:00
Konstantinos Margaritis
0f61853a37 Fix false positive uselessAssignmentArg cppcheck warning
(also at the same time some cstyleCast warnings)
2024-05-11 17:59:46 +03:00
Konstantinos Margaritis
cd1e13d4d2
Merge pull request #265 from isildur-g/wip-isildur-g-cppcheck66
addressing cppcheck shadowFunction warnings
2024-05-10 22:43:05 +03:00
gtsoul-tech
753c7de002 Merge branch 'develop' into test-noExplicitConstructor 2024-05-10 12:46:44 +03:00
gtsoul-tech
c70c09c961 supervector 2024-05-10 12:43:45 +03:00
gtsoul-tech
bdffbde80f noExplicitConstructor 1 more 2024-05-10 10:08:14 +03:00
gtsoul-tech
94b17ecaf2 noExplicitConstructor 2024-05-10 10:07:47 +03:00
G.E
4ff92f87d9 remove test comment 2024-05-09 15:12:02 +03:00
g. economou
cc63087d06
Merge branch 'develop' into wip-isildur-g-cppcheck66 2024-05-09 10:28:25 +03:00
G.E
8499055605 test comment 2024-05-09 00:12:30 +03:00
G.E
c38bb8bc1d one more fixed that had been missed. 2024-05-08 12:33:22 +03:00
Konstantinos Margaritis
7dd2135b80
Merge pull request #264 from gtsoul-tech/bugFix/cppcheck-constVariablePointer
Cppcheck constVariablePointer error
2024-05-08 10:28:24 +03:00
G.E
82cf36724e maybe fix broken merge 2024-05-05 18:26:20 +03:00
G.E
bb056ab63f just a dummy commit to test something 2024-05-05 12:11:17 +03:00
g. economou
13e5183be2
Merge branch 'develop' into wip-isildur-g-cppcheck66 2024-05-02 18:37:46 +03:00
G.E
c7f7d17ebc addressing cppcheck shadowFunction warnings 2024-05-02 18:00:03 +03:00
Konstantinos Margaritis
692a63c8ca
Merge pull request #263 from gtsoul-tech/bug/cppcheck-61
Cppcheck knownConditionTrueFalse error
2024-05-02 16:50:16 +03:00
gtsoul-tech
5ad1f2127f constVariablePointer 2024-05-02 14:30:18 +03:00
gtsoul-tech
a634d57b2d knownConditionTrueFalse fixes previously fp 2024-05-02 10:13:55 +03:00
Konstantinos Margaritis
9fdbdac06c
Merge pull request #262 from isildur-g/wip-isildur-g-cppcheck-47-48-58
addressing 47,48,58
2024-05-01 17:39:42 +03:00
gtsoul-tech
389b55c647 refactor bool to void setDistinctTops setDistinctRoseTops
setDistinctSuffixTops
2024-05-01 15:21:36 +03:00
gtsoul-tech
ea420114a7 knownConditionTrueFalse 2024-05-01 13:04:51 +03:00
G.E
d3dd448641 the merge got screwed up, this should fix it 2024-05-01 11:22:32 +03:00
g. economou
727cff3621
Merge branch 'develop' into wip-isildur-g-cppcheck-47-48-58 2024-05-01 10:59:59 +03:00
G.E
9902ca0e34 addressing 47 [constParameterReference],48 [constVariableReference],58
[constVariable]
2024-05-01 10:54:15 +03:00
Konstantinos Margaritis
27bb2b9134
Merge pull request #239 from ypicchi-arm/feature/add-sheng-unit-tests
Feature/add sheng unit tests
2024-05-01 00:07:14 +03:00
Konstantinos Margaritis
d6e185cd75
Merge pull request #261 from gtsoul-tech/bug/cppCheck-41
Cppcheck unreadVariable error
2024-04-30 21:10:08 +03:00
Yoan Picchi
f2d8d63793 Add sheng tests
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-04-30 14:34:14 +00:00
gtsoul-tech
b5bf3d8d31 unreadVariable 2024-04-30 13:36:39 +03:00
Konstantinos Margaritis
2921e50ecc
Merge pull request #259 from gtsoul-tech/bug/cppcheckErrors
Bug/cppcheck errors (32,35) WIP
2024-04-30 10:27:47 +03:00
Konstantinos Margaritis
aa1955a368
Merge pull request #258 from isildur-g/wip-isildur-g-cppcheck1220
Wip cppcheck1220
2024-04-29 21:58:13 +03:00
gtsoul-tech
bb6464431f new variableScope 2024-04-29 15:09:55 +03:00
G.E
2a476df2c5 fixed const adjustments. 2024-04-29 13:38:35 +03:00
gtsoul-tech
ec8cda3f49 variableScopeFix 2024-04-29 13:28:16 +03:00
gtsoul-tech
987cd17160 variableScope 2024-04-29 13:13:07 +03:00
G.E
f463357a38 ome of the consts couldnt be propagated and had to be reverted 2024-04-29 12:39:28 +03:00
gtsoul-tech
62e3450eae missingOverride 2024-04-29 10:26:39 +03:00
Yoan Picchi
49fd4f0047 Enable sheng32 and sheng64 on Arm
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-04-25 13:00:54 +00:00
Konstantinos Margaritis
131672d175
Merge pull request #257 from isildur-g/wip-isildur-g-cppcheck56
cppcheck warnings 5,6
2024-04-25 10:18:07 +03:00
Konstantinos Margaritis
71fcade2ac
Merge pull request #256 from gtsoul-tech/bug/cppcheckErrors
Bug/cppcheck errors (3 ,4 ,7 ,10,11) WIP
2024-04-25 10:17:23 +03:00
G.E
7fd45f864c next batch for cppeheck, addressing syntaxError and
constParameterPointer
2024-04-24 17:32:09 +03:00
G.E
e291d498fa fixed merge mixup 2024-04-24 16:26:38 +03:00
g. economou
2e68780fb5
Merge branch 'develop' into wip-isildur-g-cppcheck56 2024-04-24 16:13:39 +03:00
G.E
11e4968367 cppcheck invalidPrintfArgType_uint warnings 2024-04-24 15:55:57 +03:00
gtsoul-tech
5dab841cea badBitmaskCheck 2024-04-24 15:50:55 +03:00
gtsoul-tech
72371a05dc derefInvalidIteratorRedundantCheck 2024-04-24 13:15:17 +03:00
gtsoul-tech
fd3e251afa redundantInitialization 2024-04-24 12:40:55 +03:00
Konstantinos Margaritis
db07cddaec
Merge pull request #255 from isildur-g/wip-isildur-g
addressing some cppcheck warnings.
2024-04-24 12:39:17 +03:00
gtsoul-tech
adda613f51 shiftTooManyBitsSigned 2024-04-24 11:13:28 +03:00
gtsoul-tech
e6c884358e uninitvar 2024-04-24 11:13:02 +03:00
gtsoul-tech
9b9df1b397 invalidPrintfArgType_sint 2024-04-24 11:07:23 +03:00
Konstantinos Margaritis
08942348c6
Merge pull request #254 from gtsoul-tech/bug/cppcheckErrors
Bug/cppcheck errors WIP
2024-04-24 10:55:51 +03:00
G.E
d7006a7c85 removed another commented line 2024-04-24 00:06:08 +03:00
G.E
91d9631c97 fixed some const issues 2024-04-24 00:04:59 +03:00
G.E
8dabc86a69 removed commented lines. 2024-04-23 23:46:08 +03:00
G.E
01dee390a9 ddressing some cppcheck warnings. yes this will be cleaned up in a
following commit. tests pass.
2024-04-23 19:08:24 +03:00
gtsoul-tech
52b0076f4f accessMoved 2024-04-23 14:49:10 +03:00
gtsoul-tech
8d3a5d7cf1 legacyUninitvar 2024-04-23 14:48:58 +03:00
gtsoul-tech
73e00e2abc funcArgOrderDifferent 2024-04-23 14:48:51 +03:00
gtsoul-tech
9316d65022 redundantContinue 2024-04-23 14:48:35 +03:00
gtsoul-tech
9d5753b215 comparisonOfBoolWithBoolError 2024-04-23 14:48:12 +03:00
gtsoul-tech
182f7ddb47 useInitializationList 2024-04-23 14:47:21 +03:00
gtsoul-tech
06fc35321d unsignedLessThanZero cppcheck 2024-04-23 12:27:43 +03:00
gtsoul-tech
c4bffd7cef accessMoved cppcheck error 2024-04-23 12:15:12 +03:00
Konstantinos Margaritis
53015b2bbf
Merge pull request #250 from isildur-g/static-fat-dispatch
static dispatch for fat runtimes. eliminates the need for ifunc.
2024-04-23 11:11:21 +03:00
g. economou
7590fdc8c7
Merge branch 'develop' into static-fat-dispatch 2024-04-23 10:40:47 +03:00
Gregory Economou
c7439a605e removed LD_LIBRARY_PATH comment from readme 2024-04-22 12:07:15 +03:00
Konstantinos Margaritis
9c52d3d6c1
Merge pull request #246 from isildur-g/bugfix-242
the rpath hack is only needed on arm
2024-04-19 16:26:37 +03:00
Gregory Economou
cfa8397e97 static dispatch for fat runtimes. eliminates the need for ifunc. 2024-04-19 12:32:00 +03:00
G.E.
acbef47c74 tiny change to readme 2024-04-18 16:16:06 +03:00
Konstantinos Margaritis
6a7574501d
Merge pull request #248 from VectorCamp/feature/update-simde
Update SIMDe
2024-04-18 11:26:45 +03:00
G.E.
50fdcaf357 readme edit 2024-04-17 23:03:09 +03:00
Konstantinos Margaritis
6546dd856a
Merge pull request #247 from gtsoul-tech/bugfix/#245
gcc-14 compilation fix Closes:#245
2024-04-17 21:32:21 +03:00
Konstantinos Margaritis
cdc0d47cde Update SIMDe 2024-04-17 17:23:11 +03:00
G.E.
1e614dc861 enable the rpath hack on all gcc13, and on arm/gcc12 2024-04-17 15:40:52 +03:00
gtsoul-tech
51ac3a2287 clang-format revert 2024-04-17 13:55:42 +03:00
gtsoul-tech
f2db0cdf01 gcc-14 compilation fix Closes:#245 2024-04-17 13:33:48 +03:00
G.E.
3b37add4d8 the rpath hack is only needed on arm 2024-04-17 11:33:00 +03:00
Konstantinos Margaritis
3c04ec25bb
Merge pull request #232 from isildur-g/develop
minor changes to build in BSD (Net and Free)
2024-04-17 08:59:49 +03:00
G.E.
6e306a508e removing the dispatcher.c changes from this branch/PR 2024-04-16 17:43:11 +03:00
Gregory Economou
b6b0ab1a9b added a fixme for the clunky rpath setting logic 2024-04-16 15:18:58 +03:00
Gregory Economou
81722040f8 cmake adds newlines to variables set from command output, trashing the
usefulness of the flexible method of setting the rpath var. back to
the clunky manual setting.
2024-04-16 15:04:20 +03:00
Gregory Economou
00b1a50977 made the rpath finding a bit more flexible than just hardcoded gcc12 and gcc13 2024-04-16 13:09:05 +03:00
Gregory Economou
12e95b2c5c bit hacky but it works for setting rpath in freebsd 2024-04-16 10:17:53 +03:00
Gregory Economou
b312438929 probably fixed the bit about not finding the right libs in some bsd installs 2024-04-15 16:50:58 +03:00
Gregory Economou
d96206a12f united the static fat runtime dispatcher with the BSD support. 2024-04-15 14:59:08 +03:00
G.E.
2e86b8524d first try at getting cmake to leave custom shared lib paths with the binary,
with package added compilers we need to find the right std libs from the
compiler we added, not the base install libs.
2024-04-15 11:52:33 +03:00
G.E.
b0916df825 one more place to fix where clang in bsd is more picky than gcc in linux 2024-04-15 11:44:22 +03:00
G.E.
0376319a93 tiny edit to readme 2024-04-15 11:07:45 +03:00
G.E.
d0b39914df added ccache for freebsd install recommendations 2024-04-11 17:47:16 +03:00
G.E.
372053ba4f adding libpcap to the bsd packages to install for building/running 2024-04-11 15:06:18 +03:00
G.E.
773d57d890 added copyright info for modified src files 2024-04-11 09:49:43 +03:00
Konstantinos Margaritis
c2700cafd9
Merge pull request #237 from gtsoul-tech/feature/microbenchmarkingCSV
Microbenchmarking tool changed color output to csv output
2024-04-02 17:33:28 +03:00
gtsoul-tech
3670e52c87 output tabulated and csv 2024-04-02 14:56:27 +03:00
gtsoul-tech
62a275e576 change first column name csv 2024-04-02 13:32:51 +03:00
gtsoul-tech
b5a29155e4 removed color output code 2024-04-02 11:28:00 +03:00
gtsoul-tech
50a62a17ff changed color output to csv output 2024-04-01 16:05:13 +03:00
isildur-g
e20ba37208 slightly clearer comments in netbsd section 2024-03-27 17:33:25 +02:00
isildur-g
42653b8a31 also sqlite info for bsd 2024-03-27 17:28:24 +02:00
isildur-g
e2ce866462 some more bsd detail 2024-03-27 17:24:08 +02:00
isildur-g
e239f482fd more system prep info for bsd 2024-03-27 17:19:56 +02:00
isildur-g
dc371fb682 also added note for CC/CXX vars in fbsd/ppc which are different. 2024-03-27 16:35:21 +02:00
isildur-g
17c78ff23c more verbose instructions for preparing BSD systems. 2024-03-27 16:31:02 +02:00
Konstantinos Margaritis
d7fb5f437a
Merge pull request #235 from VectorCamp/revert-234-feature/add-wider-sheng-implementation-on-arm
Revert "RFC Enable sheng32/64 for SVE"
2024-03-19 13:24:27 +02:00
Konstantinos Margaritis
f5412b3509
Revert "RFC Enable sheng32/64 for SVE" 2024-03-19 11:40:23 +02:00
Konstantinos Margaritis
c9b3a86908
Merge pull request #234 from ypicchi-arm/feature/add-wider-sheng-implementation-on-arm
RFC Enable sheng32/64 for SVE
2024-03-14 21:58:42 +02:00
G.E.
1ea53768a6 whitespace editing in readme 2024-03-13 17:33:23 +02:00
G.E.
b006d7f620 shortened freebsd text 2024-03-12 16:50:57 +02:00
isildur-g
d0498f942d typo fix 2024-03-12 13:58:50 +01:00
G.E.
0045a2bdc7 moved HAVE_BUILTIN_POPCOUNT def to cmake 2024-03-12 14:22:39 +02:00
G.E.
226645eaf1 another small cleanup in readme 2024-03-12 12:59:16 +02:00
G.E.
d6d7a96c44 incremental improvement in cleanliness 2024-03-12 12:37:08 +02:00
Konstantinos Margaritis
6b459843e6
Merge pull request #231 from jlinton/develop-add-man-pages
Add man page generation, change man section, update docs to reflect name change, and couple other tweaks
2024-03-12 10:34:09 +02:00
Konstantinos Margaritis
9db7b529e2
Merge pull request #233 from bradlarsen/develop
Add CMake options for more build granularity
2024-03-12 10:05:52 +02:00
Yoan Picchi
f9e254ab41 Enable sheng32/64 for SVE
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-03-11 12:46:08 +00:00
Jeremy Linton
6bbd4821f0 hsbench: Update test program output
While fixing the documentation, it was noticed that the hsbench
output was still referring to the project as Hyperscan.
Lets correct it.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
2024-03-07 17:55:47 -06:00
Jeremy Linton
0c57b6c894 pkgconfig: Correct library description
Correct the description in the pkgconfig file, but
leave the name alone as we want to remain compatible
with projects utilizing hyperscan.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
2024-03-07 17:55:47 -06:00
Jeremy Linton
943f198ebf documentation: Replace project name with Vectorscan and general updates
The generated documentation continues to refer to Hyperscan
despite the project now being VectorScan. Lets replace many
of the Hyperscan references with Vectorscan.

At the same time, lets resync the documentation here with the
vectorscan readme. This updates the supported platforms/compilers
and build options.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
2024-03-07 17:55:47 -06:00
Jeremy Linton
2d23d24b67 documentation: Update project name and copyright
The project name in the documentation should probably
be updated to reflect that this is vectorscan. Update
the copyright too.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
2024-03-07 17:55:47 -06:00
Jeremy Linton
d9a75dc3b9 documentation: Add cmake option to build man pages
Man pages tend to be preferred in some circles, lets add an
option to build the vectorscan documentation that way.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
2024-03-07 17:55:47 -06:00
Brad Larsen
bd7423f4f0 Add CMake options for more build granularity
This adds three new CMake options, all defaulting to true, making it
possible to opt-out of building parts of Vectorscan that are not
essential for deployment of the matching runtime.

These new options:

- `BUILD_UNIT`: control whether the `unit` directory is included
- `BUILD_DOC`: control whether the `doc` directory is included
- `BUILD_TOOLS`: control whether the `tools` directory is included
2024-03-06 16:32:12 -05:00
isildur-g
523db6051d maybe netbsd is more pedantic about this? 2024-03-06 17:05:55 +01:00
isildur-g
afcbd28d3b lets not disable warnings 2024-03-06 16:39:34 +01:00
isildur-g
f7a4d41c63 more readme format tinkering 2024-03-06 15:53:03 +01:00
isildur-g
4739c76a11 formatting in readme 2024-03-06 15:50:26 +01:00
G.E.
a6a35e044c minor change for more certain success in netbsd 2024-03-06 15:28:20 +01:00
G.E.
30ae8505c3 lets rather not add lines of code not yet used anywhere 2024-03-06 14:57:28 +01:00
G.E.
a05c891146 updated readme to reflect FreeBSD build 2024-03-06 12:31:35 +01:00
G.E.
12f61d15ed support building on NetBSD 2024-03-06 10:48:56 +01:00
Konstantinos Margaritis
71f3e7d994
Merge pull request #229 from ProBrian/bugfix/pcre_check_error
fix the pcre versio check error on clang 16
2024-01-29 20:26:57 +02:00
Jingbo Chen
24786ae332 fix the pcre versio check error on clang 16 2024-01-29 10:50:16 +08:00
Konstantinos Margaritis
98eb459ac2
Merge pull request #225 from VectorCamp/feature/cleanup-compiler-warnings
According to https://buildbot-ci.vectorcamp.gr/#/changes/93

most builds succceded and with no compiler warnings. The build failures were only on x86 and Arm for SIMDe builds: x86 because of a bug in SIMDe emulation of own x86 intrinsics in non-native mode and Arm due to clang, unsure if this is actually a bug in SIMDe or clang itself. All the remaining compiler warnings that were suppressed was because they were not possible to fix for the scope of this project. 

This PR will close #170, code quality improvements however will continue with the integration of #222 or similar static code analyzer to CI and continuous refactoring.
2024-01-20 22:41:00 +02:00
Konstantinos Margaritis
01658be05d remove unused warning exceptions 2024-01-20 19:52:31 +02:00
Konstantinos Margaritis
8cd365121d Revert "fix more unused-variable warnings"
This reverts commit afb1a1705f8073ba43b38845d3aa1329634083ed.
2024-01-20 17:46:29 +02:00
Konstantinos Margaritis
68dab83799 Revert "fix unused-variable warning"
This reverts commit ac02b589beebad99820a8b42e6b96e598e7da929.
2024-01-20 17:46:29 +02:00
Konstantinos Margaritis
5fa5e142b9 add -Wno-deprecate-lax-vec-conv-all on clang 15 for Power only 2024-01-20 17:45:56 +02:00
Konstantinos Margaritis
528a165c20 Revert "don't demand 32/64-byte alignment if there is no 256/512-bit SIMD engine"
This reverts commit 719e1c9be6fd6fd316889ac7625253d0ad9c5fd5.
2024-01-19 17:41:40 +02:00
Konstantinos Margaritis
1a4e878abe Revert "if we don't have a 256/512-bit SIMD engine, there is no need to have 32/64-byte alignment and gcc complains anyway"
This reverts commit 9134cd6250f47034e15ef42981a3257ae4e3d506.
2024-01-19 15:23:11 +02:00
Konstantinos Margaritis
4fb8cee35f -Wno-pass-failed is only for ppc64le 2024-01-19 11:23:17 +02:00
Konstantinos Margaritis
3d0df318b8 use snprintf() instead 2024-01-18 23:40:38 +02:00
Konstantinos Margaritis
5d38d0d0a5 add needed deprecation warning exceptions for SIMDe on Power VSX 2024-01-18 23:37:59 +02:00
Konstantinos Margaritis
a1258680ac remove unused constants 2024-01-18 22:08:19 +02:00
Konstantinos Margaritis
ffa6926608 set x86-64-v2 as baseline arch for fat runtime 2024-01-18 22:08:05 +02:00
Konstantinos Margaritis
634c884204 use x86-64-v2 as default x86 arch for SIMDe 2024-01-18 21:24:38 +02:00
Konstantinos Margaritis
b5ae828e61
Merge pull request #226 from ypicchi-arm/opti/remove_unused_instruction_truffle_sve
Make the match component of SVE truffle constant time
2024-01-18 21:20:47 +02:00
Konstantinos Margaritis
719e1c9be6 don't demand 32/64-byte alignment if there is no 256/512-bit SIMD engine 2024-01-18 18:37:27 +02:00
Yoan Picchi
01d8a2d768 Make the match component of SVE truffle constant time
There are no significant speed up for 128b vectors but we expect some speedup
for wider vectors compared to the previous linear time implementation of the
match.hpp component

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-01-18 11:53:45 +00:00
Konstantinos Margaritis
b6e3c66015 WIP: after cleaning up the code, remove the warnings from compilation flags 2024-01-18 00:47:44 +02:00
Konstantinos Margaritis
b106c10b4d use arch set in cflags-x86.cmake 2024-01-18 00:43:32 +02:00
Konstantinos Margaritis
9e4789d374 fix some build misconfigurations on x86 2024-01-18 00:43:11 +02:00
Konstantinos Margaritis
9134cd6250 if we don't have a 256/512-bit SIMD engine, there is no need to have 32/64-byte alignment and gcc complains anyway 2024-01-18 00:42:36 +02:00
Konstantinos Margaritis
593299e7bb check the correct define 2024-01-18 00:41:56 +02:00
Konstantinos Margaritis
f6387e34da add info message 2024-01-18 00:41:23 +02:00
Konstantinos Margaritis
73f70e3d2e WIP: only keep the absolutely necessary warning exceptions 2024-01-17 17:18:12 +02:00
Konstantinos Margaritis
6b9068db0f initialize variable 2024-01-17 17:16:02 +02:00
Konstantinos Margaritis
5e1972efce remove redundant moves 2024-01-17 17:15:32 +02:00
Konstantinos Margaritis
9fac2bf78d remove unused constant 2024-01-17 17:13:51 +02:00
Konstantinos Margaritis
afb1a1705f fix more unused-variable warnings 2024-01-17 17:03:19 +02:00
Konstantinos Margaritis
4d2bcff7b4 remove unused variable 2024-01-17 17:02:32 +02:00
Konstantinos Margaritis
ac02b589be fix unused-variable warning 2024-01-17 17:02:08 +02:00
Konstantinos Margaritis
f8fdd979f1 set default x86 architecture to baseline 2024-01-17 17:00:47 +02:00
Konstantinos Margaritis
46488b8097
Merge pull request #219 from VectorCamp/bugfix/make-sqlite-optional
Make sqlite optional, use OS installed
2024-01-15 16:57:11 +02:00
Konstantinos Margaritis
8b2ebeb06b make pkgconf not a hard requirement 2024-01-15 13:17:20 +02:00
Konstantinos Margaritis
e2439685a9
Merge pull request #221 from VectorCamp/bugfix/bug202-unit-internal
Do not assume unit-internal is built for unit target
2024-01-11 16:26:01 +02:00
Konstantinos Margaritis
1988ff5a6d Do not assume unit-internal is built for unit target 2024-01-11 13:23:37 +02:00
Konstantinos Margaritis
6e1c3a10fa
Merge pull request #220 from VectorCamp/feature/fatruntime-enabled-on-x86
Feature/fatruntime enabled on x86
2024-01-10 22:47:23 +02:00
Konstantinos Margaritis
6f4409365a enable AVX2,AVX512,AVX512 for fat runtimes on x86 2024-01-10 18:26:12 +02:00
Konstantinos Margaritis
f68a1e526c Enable Fat runtime on x86 by default to help migration from hyperscan 2024-01-10 18:25:31 +02:00
Konstantinos Margaritis
d9ebb20010 Make sqlite optional, use OS installed 2024-01-10 14:28:06 +02:00
Konstantinos Margaritis
eca4049ce4
Merge pull request #217 from ypicchi-arm/feature/Add-truffle-SVE-implementation
Add truffle SVE implementation
2024-01-09 22:53:09 +02:00
Yoan Picchi
8e5abfebf0 Add truffle SVE implementation
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-01-09 16:50:03 +00:00
Konstantinos Margaritis
17fb9f41f6
Merge pull request #215 from VectorCamp/feature/use-ccache
use ccache if available
2023-12-22 01:18:22 +02:00
Konstantinos Margaritis
ad70693999 use ccache if available 2023-12-21 12:59:56 +00:00
Konstantinos Margaritis
3113d1ca30
Merge pull request #212 from VectorCamp/bugfix/fix-simde-build
SIMDe on Clang needs SIMDE_NO_CHECK_IMMEDIATE_CONSTANT defined and other SIMDe related fixes now that SIMDe is part of the CI pipeline.

Some issue with SIMDe on x86 still remains because of an upstream bug:

https://github.com/simd-everywhere/simde/issues/1119

Similarly SIMDe native with clang on Arm also poses a non-high priority build failure:

https://buildbot-ci.vectorcamp.gr/#/builders/129/builds/11

Possibly a SIMDe issue as well, need to investigate but will merge this PR as these are non-blockers.
2023-12-21 11:04:32 +02:00
Konstantinos Margaritis
10d957477a fix typo in baseline x86 arch definition 2023-12-20 22:21:00 +02:00
Konstantinos Margaritis
ef37e6015a native CPU on SIMDe will enable all sorts of features in an unpredicted manner, set sane defaults 2023-12-20 16:43:38 +00:00
Konstantinos Margaritis
306e8612be GREATER_EQUAL 2023-12-20 15:27:56 +00:00
Konstantinos Margaritis
a7a12844e7 reorganize OS detection 2023-12-20 17:16:45 +02:00
Konstantinos Margaritis
44f19c1006 fix submodule headers detection 2023-12-20 17:16:23 +02:00
Konstantinos Margaritis
2aa5e1c710 fix arch=native on arm+clang 2023-12-20 15:15:38 +00:00
Konstantinos Margaritis
1b915cfb93 add fallback pdep64 for x86 if no HAVE_BMI2 2023-12-20 08:25:30 +02:00
Konstantinos Margaritis
49e6fe15a2 add missing pdep64 for x86 bitutils 2023-12-20 00:12:15 +02:00
Konstantinos Margaritis
8cba258e7f add missing pdep64 for arm and ppc64le 2023-12-19 23:15:27 +02:00
Konstantinos Margaritis
c8ba7fa1d3 add missing pdep64 for common bitutils 2023-12-19 23:09:03 +02:00
Konstantinos Margaritis
e15ad9308a SIMDe on Clang needs SIMDE_NO_CHECK_IMMEDIATE_CONSTANT defined 2023-12-19 17:31:43 +02:00
Konstantinos Margaritis
a26bed96bc
Merge pull request #203 from VectorCamp/feature/enable-simde-backend
Feature/enable simde backend
2023-11-29 11:22:08 +02:00
Konstantinos Margaritis
519bd64c65 fix failing allbits test for ppc64le on clang15 2023-11-29 01:39:05 +02:00
Konstantinos Margaritis
d3f6d2ad06 updates to the Readme 2023-11-28 18:27:08 +02:00
Konstantinos Margaritis
9fd0ce5d44 search for SIMDE sse4.2.h header 2023-11-28 17:39:55 +02:00
Konstantinos Margaritis
6332cb91f5 separate ARCH_FLAG logic 2023-11-28 17:28:48 +02:00
Konstantinos Margaritis
3beda7e5e0 add missing else 2023-11-28 14:09:26 +02:00
Konstantinos Margaritis
be9ce68767 make diffrich384 available on all arches 2023-11-28 12:06:46 +00:00
Konstantnos Margaritis
f5e508b13f fix compilation for SIMDe 2023-11-27 20:52:52 +00:00
Konstantnos Margaritis
23aeaecf53 use pkg-config for SIMDe 2023-11-27 20:51:47 +00:00
Konstantnos Margaritis
8c7b503ac4 fix TUNE_FLAG for SIMDE_BACKEND 2023-11-27 20:51:29 +00:00
Konstantinos Margaritis
f57928ea08 fix SIMDe emulation builds on Arm, add native translation from x86 for comparison 2023-11-27 12:21:58 +00:00
Konstantnos Margaritis
dfacf75855 existing scalar implementations were incorrect -but never tested, ported from arm/ppc64le 2023-11-23 16:09:10 +00:00
Konstantnos Margaritis
20f4f542a5 add missing intrinsics for SIMDe backend 2023-11-23 16:08:26 +00:00
Konstantnos Margaritis
62cb8d6c2d fix test for SIMDe 2023-11-23 16:07:58 +00:00
Konstantnos Margaritis
b32ca719d9 SIMDE is a valid platform 2023-11-23 13:07:28 +00:00
Konstantnos Margaritis
7c53b4e608 add include dirs 2023-11-21 17:14:21 +00:00
Konstantnos Margaritis
14c9222a48 add generic tune flags 2023-11-21 17:13:54 +00:00
Konstantnos Margaritis
a8e9b9069e enable SIMDe backend 2023-11-21 17:13:33 +00:00
Konstantnos Margaritis
b068087240 add SIMDe ports of simd_utils and supervector 2023-11-21 17:12:04 +00:00
Konstantnos Margaritis
b5cde5ebf7 mofidied .gitmodules 2023-11-21 17:11:09 +00:00
Konstantnos Margaritis
8455cba03d add SIMDe cmake file 2023-11-21 17:09:48 +00:00
Konstantnos Margaritis
129015afc6 add SIMDe git submodule 2023-11-21 17:09:24 +00:00
Konstantnos Margaritis
d24d67c28b Add SIMDe backend to CMake 2023-11-21 17:06:22 +00:00
Konstantinos Margaritis
44b893abfc
Merge pull request #200 from VectorCamp/bugfix/install-static-libs
fix missing installation of static libs
2023-11-21 11:46:29 +02:00
Konstantinos Margaritis
c3a6bb3cb3
Merge pull request #199 from gliwka/fix-missing-hs-version-header
Fix missing hs_version.h header (closes #198)
2023-11-21 11:46:00 +02:00
Konstantinos Margaritis
d611fcbaa8 fix missing installation of static libs 2023-11-20 22:39:12 +02:00
Matthias Gliwka
343e523763 fix missing hs_version.h header (closes #198) 2023-11-20 21:52:42 +02:00
Konstantinos Margaritis
574e525c46
Merge pull request #196 from VectorCamp/feature/prepare-5.4.11
Feature/prepare 5.4.11
2023-11-20 07:37:14 +02:00
Konstantinos Margaritis
41fb015616 expand on build-deps installation 2023-11-19 20:00:06 +02:00
Konstantinos Margaritis
393dee3697 add sanitizer flags 2023-11-19 19:53:02 +02:00
Konstantinos Margaritis
08b904b31c more changes to readme 2023-11-19 19:37:06 +02:00
Konstantinos Margaritis
a97d576ac8 cross-compiling is not tested, removed 2023-11-19 19:24:59 +02:00
Konstantinos Margaritis
aecd920b57 if none are set build static 2023-11-19 19:18:23 +02:00
Konstantinos Margaritis
d5cd29b333 additions to readme 2023-11-19 17:57:08 +02:00
Konstantinos Margaritis
9c92c7b081 add contributors file 2023-11-19 15:32:45 +02:00
Konstantinos Margaritis
8d1c7c49f0 add changelog entry 2023-11-19 15:32:36 +02:00
Konstantinos Margaritis
5e5d6d2c17 Update Readme file 2023-11-19 10:24:51 +02:00
Konstantinos Margaritis
b1522860d5 bump version 2023-11-19 10:24:32 +02:00
Konstantinos Margaritis
35acf49d5f Don't build fat runtime with native CPU detection 2023-11-19 10:24:13 +02:00
Konstantinos Margaritis
44b026a8c9 remove Jenkinsfile 2023-11-19 10:23:39 +02:00
Konstantinos Margaritis
80f84a1be5
Merge pull request #191 from VectorCamp/bugfix/fix-segfault-arm-sve2
Move VERM16 enums to the end of the list
2023-11-17 14:38:01 +02:00
Konstantinos Margaritis
b5f1a82258 Move VERM16 enums to the end of the list
This was causing a hard-to-track segfault with Fat Runtime on SVE2 hw,
because of the macro-based hard-coded way to calculate offsets for each
implementation. This needs a rewrite.
2023-11-17 03:50:30 +08:00
Konstantinos Margaritis
9d0599a85e
Merge pull request #189 from mlmitch/develop
Correct set_source_files_properties usage
2023-11-01 11:35:36 +02:00
Konstantinos Margaritis
21c45f325c
Merge pull request #188 from VectorCamp/bugfix/require-pkg-config
make pkgconfig a requirement
2023-10-31 19:09:35 +02:00
Mitchell Wasson
9c139c3a6d Correct set_source_files_properties usage
The use of `CMAKE_BINARY_DIR` and `CMAKE_CURRENT_BINARY_DIR` when
specifying files to set_source_files_properties caused problems
when this project is used from another CMake project.

More specifically, these variables aren't set to the expected path,
and the properties are attempted to be set for non-existant files.

This was benign before vectorscan 5.4.8 as the only properties
set were warning suppression flags.

Starting with 5.4.9, `-funsigned-char` was applied to Ragel outputs
using this method. The result is projects depending on Vectorscan
through Cmake do not have this compile flag properly applied.
2023-10-31 09:35:50 -06:00
Konstantinos Margaritis
71bbf97b90 make pkgconfig a requirement 2023-10-31 10:38:07 +00:00
Konstantinos Margaritis
de94286fed
Merge pull request #186 from VectorCamp/bugfix/fix-compilation-arm-ubuntu-20.04
Ubuntu 20.04 gcc does not define HWCAP2_SVE2 #180
2023-10-25 13:53:44 +03:00
Konstantinos Margaritis
02474c4f52
Merge pull request #185 from VectorCamp/bugfix/fix-inconsistent-version-header
Fix version getting out of sync #175
2023-10-11 19:52:22 +03:00
Konstantinos Margaritis
a659555781 Ubuntu 20.04 gcc does not define HWCAP2_SVE2 #180 2023-10-10 18:30:12 +08:00
Konstantinos Margaritis
aa8af2621b
Merge pull request #181 from VectorCamp/bugfix/fix-clang15-compilation-errors
Fix clang 15,16 compilation errors on all platforms, refactor CMake build system
2023-10-10 13:14:10 +03:00
Konstantinos Margaritis
5a4d900675 fix default arch definition for non fat builds on arm 2023-10-10 00:55:02 +08:00
Konstantinos Margaritis
1fdeedf151 set default value 2023-10-09 20:38:19 +08:00
Konstantinos Margaritis
c4b7a44cac SVE2 is armv9-a but gcc 11 does not recognize that 2023-10-09 20:02:37 +08:00
Konstantinos Margaritis
7909b91ba4 remove vermicelli_simd.cpp to fix redefinition build failure on SVE2 builds 2023-10-09 20:01:26 +08:00
Konstantinos Margaritis
1619dbaf35 remove unneeded option 2023-10-09 10:26:08 +00:00
Konstantinos Margaritis
9445f49172 is not known at that stage 2023-10-09 10:16:40 +00:00
Konstantinos Margaritis
4d539f2c87 fix cmake refactor for arm builds 2023-10-09 10:03:53 +00:00
Konstantinos Margaritis
981576a5fe fix default arch/tune flags for ppc64le 2023-10-09 00:44:12 +03:00
Konstantinos Margaritis
5e4a1edb0c fix x86 fat binary build 2023-10-09 00:42:39 +03:00
Konstantinos Margaritis
e85f7cc9c9 fix sqlite3 version detection 2023-10-09 00:23:29 +03:00
Konstantinos Margaritis
ee8a3c29cc fix cflags detection for x86 2023-10-09 00:23:08 +03:00
Konstantinos Margaritis
0d5ce27df4 fix defaults for -march for x86 2023-10-09 00:22:52 +03:00
Konstantinos Margaritis
6beeb372bc increase cmake_minimum_version, basically the one in Debian 11 2023-10-09 00:22:31 +03:00
Konstantinos Margaritis
3884f597d3 add missing file 2023-10-08 23:54:06 +03:00
Konstantinos Margaritis
24ae1670d6 WIP: Refactor CMake build system to more modular 2023-10-08 23:27:24 +03:00
Konstantinos Margaritis
0e403103d6 SVE2 needs armv9-a, fix build 2023-10-08 00:00:42 +08:00
Konstantinos Margaritis
9e1c43b9ec add src/nfa/vermicelli_simd.cpp to ppc64le 2023-10-07 18:02:00 +03:00
Konstantinos Margaritis
983a3a52bd include extra sources for Arm on non-fat builds 2023-10-07 22:27:26 +08:00
Konstantinos Margaritis
1320d01035 add missing file 2023-10-07 12:10:42 +03:00
Konstantinos Margaritis
6900806127 add cpuid_flags to ppc64le as well 2023-10-07 12:07:06 +03:00
Konstantinos Margaritis
e8e2957344 re-add missing file to x86 builds 2023-10-07 11:45:10 +03:00
Konstantinos Margaritis
7a2ccd7773 fix fat & normal build errors on arm 2023-10-07 06:17:18 +08:00
Konstantinos Margaritis
55cae8c807 detect arm_sve.h when using clang on fat runtime builds 2023-10-06 20:46:24 +08:00
Konstantinos Margaritis
a26661c849 remove extra print 2023-10-06 12:08:36 +03:00
Konstantinos Margaritis
98d7434cfd __builtin_constant_p is true in the wrong case on gcc 13.2. Exclude for now 2023-10-06 11:44:41 +03:00
Konstantinos Margaritis
22a24f12ea Reduce debug unit tests runtime even more
In single.cpp featuremask with AVX512 features is not relevant to non-x86 platforms,
and just extends the runtime for no reason.
2023-10-05 19:12:58 +03:00
Konstantinos Margaritis
e369681ce2 Don't run regression UE_2595 on debug, it times out CI 2023-10-05 10:40:30 +03:00
Konstantinos Margaritis
35c0711689 use the right type of cast 2023-10-04 23:35:10 +03:00
Konstantinos Margaritis
72afe16452 clang 16 as well 2023-10-04 22:07:34 +03:00
Konstantinos Margaritis
da88abfa39 missed one pragma 2023-10-04 20:54:57 +03:00
Konstantinos Margaritis
2e88df1a89 use the conditional in the right way 2023-10-04 20:35:58 +03:00
Konstantinos Margaritis
354fda48fb add conditional for __clang__ 2023-10-04 20:28:35 +03:00
Konstantinos Margaritis
b7d1bc0298 clang 15 (but not 16) fails on ppc64le with -Wdeprecate-lax-vec-conv-all 2023-10-04 20:09:45 +03:00
Konstantinos Margaritis
9aa61440ea Reduce unit test runtimes dramatically for debug builds 2023-10-04 19:21:30 +03:00
Konstantinos Margaritis
93d3e7eb30 fix -Wunused warnings on debug 2023-10-04 07:16:45 +00:00
Konstantinos Margaritis
9a174745e4 more std::move fixes 2023-10-03 21:01:51 +03:00
Konstantinos Margaritis
db7b23a468 move definition of RAGEL_C_FLAGS earlier to catch tools/hscollider 2023-10-03 21:01:35 +03:00
Konstantinos Margaritis
0d2f9ccbaa Fix 'unqualified call to std::move' errors in clang 15+ 2023-10-03 20:24:39 +03:00
Konstantinos Margaritis
16604f9539 Fix version getting out of sync #175 2023-10-03 09:57:10 +03:00
Konstantinos Margaritis
4918f81ea3
Merge pull request #174 from VectorCamp/develop
Minor bugfix release 5.4.10.1
2023-09-08 13:42:33 +03:00
Konstantinos Margaritis
adedf2acf3
Merge pull request #173 from VectorCamp/bugfix/disable-fat-macos-arm
Bugfix/disable fat macos arm
2023-09-08 12:18:11 +03:00
Konstantinos Margaritis
d85d306ff9 HWCAP is only available on Linux 2023-09-08 10:08:44 +03:00
Konstantinos Margaritis
1d25f9b8f5 force disable FAT_RUNTIME on MacOS on Arm 2023-09-08 10:08:18 +03:00
Konstantinos Margaritis
ad42abe7b4 forgot to update changelog for latest entry 2023-09-07 20:10:20 +03:00
Konstantinos Margaritis
85a6973060
Merge pull request #167 from VectorCamp/develop
Prepare for 5.4.10
2023-09-07 20:03:49 +03:00
Konstantinos Margaritis
a344cd30f7 minor fix 2023-09-07 17:53:25 +03:00
Konstantinos Margaritis
68346631c7 bump version, add Vectorscan Changelog 2023-09-07 17:51:07 +03:00
Konstantinos Margaritis
e843ac80c9
Merge pull request #169 from VectorCamp/feature/backport-hyperscan-2023Q3
Feature/backport hyperscan 2023 q3
2023-09-05 20:11:30 +03:00
Hong, Yang A
4344d2fce7 changelog: updates for 5.4.2 release 2023-09-05 13:58:33 +03:00
Hong, Yang A
5209c7978a remove invalid nfa dump info 2023-09-05 13:58:24 +03:00
Hong, Yang A
c6523453d7 scratch: remove quick validity check
Roll back fix for github issue #350

About Scratch Usage:
For compile time, scratch space is strongly recommended to be
allocated immediately after database generation.
For runtime, besides using scratch for corresponding database,
Hyperscan also allows user to use larger scratch space allocated
for another database.
When multiple concurrent threads need to use the same databases
and a new scratch space is required, cloning the largest one is
always safe. This is realized based on API hs_scratch_size() and
hs_clone_scratch().
Behaviors beyond above are discouraged and results are undefined.
2023-09-05 13:58:17 +03:00
Chang, Harry
ab4f837607 changelog: updates for 5.4.1 release 2023-09-05 13:57:42 +03:00
Hong, Yang A
dc78dc1633 sanitiser bugfix 2023-09-05 13:56:24 +03:00
Hong, Yang A
4fb3a48dfd bugfix: add vbmi case for test in database.cpp 2023-09-05 13:52:10 +03:00
Hong, Yang A
6765b35d48 bugfix: add vbmi platform parameter for tests in single.cpp 2023-09-05 13:52:03 +03:00
Hong, Yang A
91f0cb6cea fix nfa dump error 2023-09-05 13:51:22 +03:00
Hong, Yang A
7f2f7d2a1e scratch: add quick validity check
fix github issue #350
2023-09-05 13:51:15 +03:00
Hong, Yang A
659525480c stream close: free stream to avoid memory leak
fix github issue #303
2023-09-05 13:50:56 +03:00
Hong, Yang A
941cc7144b Silence clang-14 warnings 2023-09-05 13:50:45 +03:00
Hong, Yang A
fc5a423c7e Fix cmake CMP0115 warning for CMake 3.20 and above 2023-09-05 13:50:30 +03:00
Hong, Yang A
b7ee9102ee update year 2022 2023-09-05 13:49:52 +03:00
Hong, Yang A
684f0ce2cb UTF-8 validation: fix one cotec check corner issue
fix github issue #362
2023-09-05 13:49:41 +03:00
Hong, Yang A
7c1835c0e7 stringop-overflow compatible fix 2023-09-05 13:48:12 +03:00
Hong, Yang A
762f4050a0 gcc-10(and above): fix compile issue caused by stringop-overflow 2023-09-05 13:47:59 +03:00
Hong, Yang A
978105a4c0 klocwork: fix risk issues 2023-09-05 13:45:33 +03:00
Konstantinos Margaritis
75dbedeebe
Merge pull request #164 from jeffplaisance/develop
adding ifndef around HS_PUBLIC_API definition so that vectorscan can be statically linked into another shared library without exporting symbols
2023-09-04 23:11:34 +03:00
Konstantinos Margaritis
b6b69a41eb
Merge pull request #165 from VectorCamp/feature/enable-fat-runtime-arm
Feature/enable fat runtime arm
2023-09-04 22:50:11 +03:00
jplaisance
4bc70b37a7 adding ifndef around HS_PUBLIC_API definition so that vectorscan can be statically linked into another shared library without exporting symbols 2023-08-23 11:07:56 -05:00
Konstantinos Margaritis
0ec7b4e77b fix SVE flags detection order #145 2023-08-23 10:21:02 +00:00
Konstantinos Margaritis
68db36f4c4 initial attempt for fat binary on Aarch64 2023-08-23 09:42:00 +00:00
Konstantinos Margaritis
38431d1117
Merge pull request #149 from azat-ch/small-vector-msan
Use std::vector instead of boost::container::small_vector under MSan
2023-05-23 18:45:10 +03:00
Konstantinos Margaritis
4fbabb66d8
Merge pull request #148 from azat-ch/getData128-msan
Fix use-of-uninitialized-value due to getData128()
2023-05-23 18:44:47 +03:00
Azat Khuzhin
07305d18ae Fix use-of-uninitialized-value due to getData128()
When temporary buffer is used in getData128(), then it may return
uninitialized data.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-12 23:13:34 +02:00
Azat Khuzhin
8a54576861 Use std::vector instead of boost::container::small_vector under MSan
There are some issues with dtors in boost::container::small_vector
and/or vector, which is reported by MSan as an error.

The suppression __attribute__((no_sanitize_memory)) works until
clang-15, but since clang-16 it does not. It looks like before clang-16
this no_sanitize_memory works for all child functions, while since
clang-16 only for this function. I've tried to add few others, but a) it
looks icky b) I haven't managed to finish this process.

Also I've measured the performance and it hadn't been changed. Though
boost::small_vector should be faster then std::vector, but apparently my
particular case hadn't affected too much.

And one more thing, MSan reports this only with -O0, with -O3 - it is
not reproduced.

<details>

<summary>MSan report:</summary>

_Note: it was slightly trimmed_

```
==11364==WARNING: MemorySanitizer: use-of-uninitialized-value
2023.05.10 15:40:53.000233 [ 11620 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 1012.32 MiB, peak 1012.32 MiB, free memory in arenas 0.00 B, will set to 1015.82 MiB (RSS), difference: 3.50 MiB
    0 0x55558d13289f in boost::container::vector_alloc_holder<boost::container::small_vector_allocator<std::__1::pair<unsigned char, unsigned char>, std::__1::allocator<void>, void>, unsigned long, boost::move_detail::integral_constant<unsigned int, 1u>>::deallocate(std::__1::pair<unsigned char, unsigned char>* const&, unsigned long) .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:455:7
    1 0x55558d139e8e in boost::container::vector_alloc_holder<boost::container::small_vector_allocator<std::__1::pair<unsigned char, unsigned char>, std::__1::allocator<void>, void>, unsigned long, boost::move_detail::integral_constant<unsigned int, 1u>>::~vector_alloc_holder() .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:420:16
    2 0x55558d139e0b in boost::container::vector<std::__1::pair<unsigned char, unsigned char>, boost::container::small_vector_allocator<std::__1::pair<unsigned char, unsigned char>, std::__1::allocator<void>, void>, void>::~vector() .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:1141:4
    3 0x55558d12a4fa in boost::container::small_vector_base<std::__1::pair<unsigned char, unsigned char>, std::__1::allocator<std::__1::pair<unsigned char, unsigned char>>, void>::~small_vector_base() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:445:80
    4 0x55558d12a4fa in boost::container::small_vector<std::__1::pair<unsigned char, unsigned char>, 1ul, std::__1::allocator<std::__1::pair<unsigned char, unsigned char>>, void>::~small_vector() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:564:7
    5 0x55558d13a21b in std::__1::__tuple_leaf<0ul, boost::container::small_vector<std::__1::pair<unsigned char, unsigned char>, 1ul, std::__1::allocator<std::__1::pair<unsigned char, unsigned char>>, void>, false>::~__tuple_leaf() .cmake-llvm16-msan/./contrib/llvm-project/libcxx/include/tuple:265:7
    6 0x55558d13a13a in std::__1::__tuple_impl<>::~__tuple_impl() .cmake-llvm16-msan/./contrib/llvm-project/libcxx/include/tuple:451:37
    7 0x55558d13a05b in std::__1::tuple<>::~tuple() .cmake-llvm16-msan/./contrib/llvm-project/libcxx/include/tuple:538:28
    8 0x55558d139f7b in ue2::flat_detail::flat_base<>::~flat_base() .cmake-llvm16-msan/./contrib/vectorscan/src/util/flat_containers.h:89:7
    9 0x55558d1299da in ue2::flat_set<>::~flat_set() .cmake-llvm16-msan/./contrib/vectorscan/src/util/flat_containers.h:152:7
    10 0x55558d4e4dda in ue2::(anonymous namespace)::DAccelScheme::~DAccelScheme() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:301:8
    11 0x55558d4ff6cf in void boost::container::allocator_traits<>::priv_destroy<ue2::(anonymous namespace)::DAccelScheme>(boost::move_detail::integral_constant<bool, false>, boost::container::small_vector_allocator<ue2::(anonymous namespace)::DAccelScheme, boost::container::new_allocator<void>, void>&, ue2::(anonymous namespace)::DAccelScheme*) .cmake-llvm16-msan/./contrib/boost/boost/container/allocator_traits.hpp:403:11
    12 0x55558d4fefde in void boost::container::allocator_traits<>::destroy<ue2::(anonymous namespace)::DAccelScheme>(boost::container::small_vector_allocator<ue2::(anonymous namespace)::DAccelScheme, boost::container::new_allocator<void>, void>&, ue2::(anonymous namespace)::DAccelScheme*) .cmake-llvm16-msan/./contrib/boost/boost/container/allocator_traits.hpp:331:7
    13 0x55558d4fc364 in boost::container::dtl::disable_if_trivially_destructible<>::type boost::container::destroy_alloc_n<>(boost::container::small_vector_allocator<ue2::(anonymous namespace)::DAccelScheme, boost::container::new_allocator<void>, void>&, ue2::(anonymous namespace)::DAccelScheme*, unsigned long) .cmake-llvm16-msan/./contrib/boost/boost/container/detail/copy_move_algo.hpp:988:7
    14 0x55558d517962 in boost::container::vector<>::~vector() .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:1138:7
    15 0x55558d4f724d in boost::container::small_vector_base<>::~small_vector_base() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:445:80
    16 0x55558d4f724d in boost::container::small_vector<>::~small_vector() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:564:7
    17 0x55558d4f2ff3 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:444:1
    18 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    19 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    20 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    21 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    22 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    23 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    24 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    25 0x55558d4e4af5 in ue2::findBestDoubleAccelScheme() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:556:5
    26 0x55558d4e2659 in ue2::findBestAccelScheme() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:569:27
    27 0x55558d3aa8ff in ue2::look_for_offset_accel(ue2::raw_dfa const&, unsigned short, unsigned int) .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:197:22
    28 0x55558d3a9727 in ue2::accel_dfa_build_strat::find_escape_strings(unsigned short) const .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:414:13
    29 0x55558d3b2119 in ue2::accel_dfa_build_strat::getAccelInfo(ue2::Grey const&)::$_0::operator()(unsigned long) const .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:606:26
    30 0x55558d3aefd4 in ue2::accel_dfa_build_strat::getAccelInfo(ue2::Grey const&) .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:627:13
    31 0x55558d2fc61f in ue2::mcclellanCompile8() .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/mcclellancompile.cpp:935:22
    32 0x55558d2e89ec in ue2::mcclellanCompile_i() .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/mcclellancompile.cpp:1510:15
    33 0x55558d2ff502 in ue2::mcclellanCompile() .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/mcclellancompile.cpp:1527:12
    34 0x55558fb13b52 in ue2::getDfa() .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_bytecode.cpp:646:15
    35 0x55558fb7e8c8 in ue2::makeLeftNfa() .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_bytecode.cpp:854:22
    36 0x55558fb6bd36 in ue2::buildLeftfix() .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_bytecode.cpp:1123:15
    37 0x55558fb21020 in ue2::buildLeftfixes() .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_bytecode.cpp:1579:9
    38 0x55558fad972c in ue2::buildNfas() .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_bytecode.cpp:2063:10
    39 0x55558fac9843 in ue2::RoseBuildImpl::buildFinalEngine(unsigned int) .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_bytecode.cpp:3660:10
    40 0x55558f2b2d86 in ue2::RoseBuildImpl::buildRose(unsigned int) .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_compile.cpp:1796:12

  Uninitialized value was stored to memory at
    0 0x55558d132898 in boost::container::vector_alloc_holder<boost::container::small_vector_allocator<std::__1::pair<unsigned char, unsigned char>, std::__1::allocator<void>, void>, unsigned long, boost::move_detail::integral_constant<unsigned int, 1u>>::deallocate(std::__1::pair<unsigned char, unsigned char>* const&, unsigned long) .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:455:56
    1 0x55558d139e8e in boost::container::vector_alloc_holder<>::~vector_alloc_holder() .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:420:16
    2 0x55558d139e0b in boost::container::vector<>::~vector() .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:1141:4
    3 0x55558d12a4fa in boost::container::small_vector_base<>::~small_vector_base() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:445:80
    4 0x55558d12a4fa in boost::container::small_vector<std::__1::pair<unsigned char, unsigned char>, 1ul, std::__1::allocator<std::__1::pair<unsigned char, unsigned char>>, void>::~small_vector() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:564:7
    5 0x55558d13a21b in std::__1::__tuple_leaf<>::~__tuple_leaf() .cmake-llvm16-msan/./contrib/llvm-project/libcxx/include/tuple:265:7
    6 0x55558d13a13a in std::__1::__tuple_impl<>::~__tuple_impl .cmake-llvm16-msan/./contrib/llvm-project/libcxx/include/tuple:451:37
    7 0x55558d13a05b in std::__1::tuple<>::~tuple() .cmake-llvm16-msan/./contrib/llvm-project/libcxx/include/tuple:538:28
    8 0x55558d139f7b in ue2::flat_detail::flat_base<>::~flat_base() .cmake-llvm16-msan/./contrib/vectorscan/src/util/flat_containers.h:89:7
    9 0x55558d1299da in ue2::flat_set<>::~flat_set() .cmake-llvm16-msan/./contrib/vectorscan/src/util/flat_containers.h:1
52:7
    10 0x55558d4e4dda in ue2::(anonymous namespace)::DAccelScheme::~DAccelScheme() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:301:8
    11 0x55558d4ff6cf in void boost::container::allocator_traits<>::priv_destroy<ue2::(anonymous namespace)::DAccelScheme>() .cmake-llvm16-msan/./contrib/boost/boost/container/allocator_traits.hpp:403:11
    12 0x55558d4fefde in void boost::container::allocator_traits<>::destroy<ue2::(anonymous namespace)::DAccelScheme>(boost::container::small_vector_allocator<>&, ue2::(anonymous namespace)::DAccelScheme*) .cmake-llvm16-msan/./contrib/boost/boost/container/allocator_traits.hpp:331:7
    13 0x55558d4fc364 in boost::container::dtl::disable_if_trivially_destructible<>::type boost::container::destroy_alloc_n<boost::container::small_vector_allocator<ue2::(anonymous namespace)::DAccelScheme, boost::container::new_allocator<void>, void>, ue2::(anonymous namespace)::DAccelScheme*, unsigned long>(boost::container::small_vector_allocator<ue2::(anonymous namespace)::DAccelScheme, boost::container::new_allocator<void>, void>&, ue2::(anonymous namespace)::DAccelScheme*, unsigned long) .cmake-llvm16-msan/./contrib/boost/boost/container/detail/copy_move_algo.hpp:988:7
    14 0x55558d517962 in boost::container::vector<ue2::(anonymous namespace)::DAccelScheme, boost::container::small_vector_allocator<ue2::(anonymous namespace)::DAccelScheme, boost::container::new_allocator<void>, void>, void>::~vector() .cmake-llvm16-msan/./contrib/boost/boost/container/vector.hpp:1138:7
    15 0x55558d4f724d in boost::container::small_vector_base<>::~small_vector_base() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:445:80
    16 0x55558d4f724d in boost::container::small_vector<>::~small_vector() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:564:7
    17 0x55558d4f2ff3 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:444:1
    18 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    19 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    20 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    21 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9

  Member fields were destroyed
    0 0x5555652e08dd in __sanitizer_dtor_callback_fields /src/llvm/worktrees/llvm-16/compiler-rt/lib/msan/msan_interceptors.cpp:961:5
    1 0x55558d4f71a6 in boost::container::small_vector<>::~small_vector() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:528:8
    2 0x55558d4f71a6 in boost::container::small_vector<>::~small_vector() .cmake-llvm16-msan/./contrib/boost/boost/container/small_vector.hpp:564:7
    3 0x55558d4f2ff3 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:444:1
    4 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    5 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    6 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    7 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    8 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    9 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    10 0x55558d4f2f41 in ue2::findDoubleBest() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:442:9
    11 0x55558d4e4af5 in ue2::findBestDoubleAccelScheme() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:556:5
    12 0x55558d4e2659 in ue2::findBestAccelScheme() .cmake-llvm16-msan/./contrib/vectorscan/src/nfagraph/ng_limex_accel.cpp:569:27
    13 0x55558d3aa8ff in ue2::look_for_offset_accel(ue2::raw_dfa const&, unsigned short, unsigned int) .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:197:22
    14 0x55558d3a9727 in ue2::accel_dfa_build_strat::find_escape_strings(unsigned short) const .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:414:13
    15 0x55558d3b2119 in ue2::accel_dfa_build_strat::getAccelInfo(ue2::Grey const&)::$_0::operator()(unsigned long) const .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:606:26
    16 0x55558d3aefd4 in ue2::accel_dfa_build_strat::getAccelInfo(ue2::Grey const&) .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/accel_dfa_build_strat.cpp:627:13
    17 0x55558d2fc61f in ue2::mcclellanCompile8(ue2::(anonymous namespace)::dfa_info&, ue2::CompileContext const&, std::__1::set<unsigned short, std::__1::less<unsigned short>, std::__1::allocator<unsigned short>>*) .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/mcclellancompile.cpp:935:22
    18 0x55558d2e89ec in ue2::mcclellanCompile_i(ue2::raw_dfa&, ue2::accel_dfa_build_strat&, ue2::CompileContext const&, bool, std::__1::set<unsigned short, std::__1::less<unsigned short>, std::__1::allocator<unsigned short>>*) .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/mcclellancompile.cpp:1510:15
    19 0x55558d2ff502 in ue2::mcclellanCompile(ue2::raw_dfa&, ue2::CompileContext const&, ue2::ReportManager const&, bool, bool, std::__1::set<unsigned short, std::__1::less<unsigned short>, std::__1::allocator<unsigned short>>*) .cmake-llvm16-msan/./contrib/vectorscan/src/nfa/mcclellancompile.cpp:1527:12
    20 0x55558fb13b52 in ue2::getDfa(ue2::raw_dfa&, bool, ue2::CompileContext const&, ue2::ReportManager const&) .cmake-llvm16-msan/./contrib/vectorscan/src/rose/rose_build_bytecode.cpp:646:15
```

</details>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-12 23:12:48 +02:00
Konstantinos Margaritis
d05491117c
Merge pull request #144 from rschu1ze/rs/fix-libcxx16
Fix compilation with libcxx 16
2023-03-29 12:55:48 +03:00
Robert Schulze
8f26c5e65f
Fix compilation with libcxx 16
After upgrading our (ClickHouse's) libcxx from 15 to 16, the compiler
started to complain about usage of an incomplete type "RoseInstruction"
in this (header) function:

  void RoseProgram::replace(Iter it, std::unique_ptr<RoseInstruction> ri) {
    ...

The reason is that libcxx 16 is the first version which implements C++23
constexpr std::unique_ptr (P2273R3, see (*)). RoseProgram::replace()
happens to be be const-evaluatable and the compiler tries to run
std::unique_ptr's ctor + dtor. This fails because at this point
RoseInstruction isn't defined yet.

There are two ways of fixing this:
1. Include rose_build_instruction.h (which contains RoseInstruction)
   into rose_build_program.h. Disadvantage: The new include will
   propagate transitively into all callers.
2. Move the function implementation into the source file which sees
   RoseInstruction's definition already. Disadvantage: Template
   instantiation is no longer automatic, instead there must be either a)
   explicit template instantiation (e.g. in rose_build_program.cpp) or
   b) all callers which instantiate the function must live in the same
   source file and do the instantiations by themselves. Fortunately, the
   latter is the case here, but potential future code outside
   rose_build_program.cpp will require ugly explicit instantiation.

(*) https://en.cppreference.com/w/cpp/23
2023-03-28 21:58:44 +00:00
Konstantinos Margaritis
b4bba94b1a
Merge pull request #143 from VectorCamp/develop
Prepare for new release 5.4.9
2023-03-23 16:11:37 +02:00
Konstantinos Margaritis
8c7fdf1e7a
Merge pull request #142 from VectorCamp/feature/bump-version
Bump version
2023-03-23 10:31:06 +02:00
Konstantinos Margaritis
eef3f06c94 Bump version 2023-03-23 08:29:20 +00:00
Konstantinos Margaritis
e6b97a99b8
Merge pull request #141 from VectorCamp/bugfix/hs-flag-utf8-signed-char-on-arm
Set Ragel.rl char type to unsigned, #135
2023-03-23 10:14:59 +02:00
Konstantinos Margaritis
842e680650 clang 14 makes some test failed because val is uninitialized 2023-03-22 21:39:03 +02:00
Konstantinos Margaritis
66289cdacf fix ExpressionParser.cpp path 2023-03-22 11:36:06 +02:00
Konstantinos Margaritis
101f6083b0 add -funsigned-char to RAGEL_C_FLAGS, move util build after that 2023-03-22 11:36:06 +02:00
Konstantinos Margaritis
9f8758d270 Force -funsigned-char to RAGEL_C_FLAGS 2023-03-22 08:49:19 +00:00
Konstantinos Margaritis
1ce45a31c5 fix typo 2023-03-21 18:11:17 +00:00
Konstantinos Margaritis
dbdbfe9473 Set Ragel.rl char type to unsigned, #135 2023-03-21 18:07:06 +00:00
Konstantinos Margaritis
0f967b9575
Merge pull request #136 from VectorCamp/feature/prefix-assume-aligned
prefix assume_aligned to avoid clash with std::assume_aligned in c++20
2022-11-01 17:41:39 +02:00
Konstantinos Margaritis
e6cfd11948 prefix assume_aligned to avoid clash with std::assume_aligned in c++20 2022-11-01 10:29:22 +00:00
Konstantinos Margaritis
6d8599eece
Merge pull request #131 from VectorCamp/develop
Prepare for new release 5.4.9
2022-09-19 17:59:40 +03:00
Konstantinos Margaritis
00d1807bb4
Merge pull request #125 from abondarev84/master
cmake change for correct placement of autodetected tune & arch flags of GCC and SVE enablement on AARCH64
2022-09-19 12:44:14 +03:00
Alex Bondarev
7133ac5be1 clang SVE build fix 2022-09-18 19:42:45 +03:00
Alex Bondarev
90ac746303 SVE enabled on user input. updated README
tune and arch flags will be applied from autodetect only if they have been created by the process, otherwise the old logical flow remains wrt the flags
2022-09-18 12:04:05 +03:00
Konstantinos Margaritis
9d34941f13
Merge pull request #129 from VectorCamp/bugfix/fix-clang-on-power
Fix compile errors on clang and Power
2022-09-16 19:04:06 +03:00
Konstantinos Margaritis
48105cdd1d move variable 2022-09-16 14:05:31 +03:00
Konstantinos Margaritis
911a98d54f clang 13+ gives wrong -Wunused-but-set-variable error on nfa/mcclellancompile.cpp about total_daddy variable, disabling 2022-09-16 14:04:59 +03:00
Konstantinos Margaritis
a4972aa191 remove leftover debug print 2022-09-16 14:03:17 +03:00
Konstantinos Margaritis
0e0147ec5c clang 14 does not allow bitwise OR for bools 2022-09-16 14:02:53 +03:00
Konstantinos Margaritis
6de45b4648 clang 14 complains about this, needs investigation 2022-09-16 14:02:26 +03:00
Konstantinos Margaritis
3fc6c8a532 [VSX] movemask needs to be explicitly aligned on clang for vec_ste 2022-09-16 12:50:33 +03:00
Konstantinos Margaritis
1a43178eeb env vars have to be in quotes 2022-09-16 12:46:35 +03:00
Konstantinos Margaritis
ef66877e9e [VSX] clang complains about the order of __vector 2022-09-16 12:41:08 +03:00
Konstantinos Margaritis
88b1bec5b7 Declarative Pipeline Jenkins environment 2022-09-16 11:59:36 +03:00
Konstantinos Margaritis
4934852003 Declarative Pipeline Jenkins environment attempt 2022-09-16 11:54:23 +03:00
Konstantinos Margaritis
bf6200ecc8 Jenkins change envVars -> withEnv 2022-09-16 11:46:09 +03:00
Alex Bondarev
4ab0730dbe additional mcpu flag cleanup 2022-09-16 00:03:08 +03:00
Alex Bondarev
d0a017da99 removed cpu reference flags and fixed tune flag 2022-09-15 18:38:01 +03:00
Alex Bondarev
69e6176e09 updated README to reflect CMake changes 2022-09-13 18:29:06 +03:00
Alex Bondarev
ee0c8f763f fix to correctly place the autodetected flags and to activate SVE options 2022-09-13 18:21:10 +03:00
Konstantinos Margaritis
f6250ae3e5 bump version 2022-09-13 12:57:08 +00:00
Konstantinos Margaritis
361feb64e3
Merge pull request #124 from VectorCamp/develop
Merge develop to master
2022-09-13 15:52:20 +03:00
Konstantinos Margaritis
d0ae940261
Merge pull request #123 from VectorCamp/feature/neon-shift-optimizations
[NEON] simplify/optimize shift/align primitives
2022-09-13 09:13:05 +03:00
Konstantinos Margaritis
67b414f2f9 [NEON] simplify/optimize shift/align primitives 2022-09-12 13:09:51 +00:00
Konstantinos Margaritis
db2a6d65f1
Merge pull request #121 from liquidaty/mingw64-develop
fix to enable successful build with mingw64
2022-09-09 13:42:49 +03:00
liquidaty
f4840adf3d fix to enable successful build with mingw64 2022-09-08 09:59:37 -07:00
Konstantinos Margaritis
0c97e5f2c2
Merge pull request #119 from VectorCamp/feature/vsx-optimizations
VSX optimizations
2022-09-08 13:41:13 +03:00
Konstantinos Margaritis
e3c237a7e0 use correct intrinsic for lshiftbyte_m128 2022-09-07 16:00:10 +03:00
Konstantinos Margaritis
756ef409b4 provide non-immediate versions of lshiftbyte/rshiftbyte on x86 2022-09-07 15:07:20 +03:00
Konstantinos Margaritis
1ae0d15181 readd simd_onebit_masks for x86, needs more work 2022-09-07 13:42:25 +03:00
Konstantinos Margaritis
0af2ba8616 [NEON] optimize mask1bit128, get rid of simd_onebit_masks 2022-09-07 10:20:01 +00:00
Konstantinos Margaritis
02ae2a3cad remove simd_onebit_masks from arm/x86 headers, as they moved to common 2022-09-07 12:41:32 +03:00
Konstantinos Margaritis
305a041c73 [VSX] optimize alignr method 2022-09-07 12:35:28 +03:00
Konstantinos Margaritis
a837cf3bee [VSX] optimize shift operators 2022-09-07 12:16:14 +03:00
Konstantinos Margaritis
be20c2c519 [VSX] optimize shifting methods, replace template Unroller 2022-09-07 12:14:15 +03:00
Konstantinos Margaritis
dc6b8ae92d optimize comparemask implementation, clean up code, use union types instead of casts 2022-09-07 02:02:11 +03:00
Konstantinos Margaritis
7295b9c718 [VSX] add algorithm for alignr w/o use of immediates 2022-09-07 00:01:54 +03:00
Konstantinos Margaritis
94fe406f0c [VSX] correct lshiftbyte_m128/rshiftbyte_m128, variable_byte_shift 2022-09-06 23:59:51 +03:00
Konstantinos Margaritis
17467ff21b [VSX] huge optimization of movemask128 2022-09-06 20:08:44 +03:00
Konstantinos Margaritis
0e7874f122 [VSX] optimize and correct lshift_m128/rshift_m128 2022-09-06 18:48:19 +03:00
Konstantinos Margaritis
026f761671 [VSX] optimized mask1bit128(), moved simd_onebit_masks to common 2022-09-06 18:10:55 +03:00
Konstantinos Margaritis
43c053a069 add popcount32x4, popcount64x4 helper functions 2022-09-06 16:55:56 +03:00
Konstantinos Margaritis
c043730675
Merge pull request #118 from VectorCamp/bugfix/hyperscan-backport-202208
Bugfix/hyperscan backport 202208
2022-09-03 09:32:43 +03:00
Konstantinos Margaritis
74ab41897c Add missing <memory> header 2022-08-30 20:40:23 +03:00
Liu Zixian
c597f69c59 fix build with glibc-2.34
SIGTSKSZ is no long a constant after glibc 2.34
https://sourceware.org/pipermail/libc-alpha/2021-August/129718.html
2022-08-29 15:37:59 +03:00
Hong, Yang A
70b2a28386 literal API: add empty string check.
fixes github issue #302, #304
2022-08-29 15:08:54 +03:00
Hong, Yang A
4f27a70dd7 chimera: fix SKIP flag issue
fix github issue #360
2022-08-29 15:03:34 +03:00
Chang, Harry
31afacc7be Corpus editor: fix random char value of UTF-8. 2022-08-29 15:03:30 +03:00
Chang, Harry
a9ca0e4de3 Corpus generator: fix random char value of UTF-8.
fixes github issue #184
2022-08-29 15:03:26 +03:00
Hong, Yang A
4d4940dfbe bugfix: fix overflow risk of strlen function 2022-08-29 15:03:22 +03:00
hongyang7
2731a3384b Fix segfaults on allocation failure (#4)
Throw std::bad_alloc instead of returning nullptr from
ue2::AlignedAllocator. Allocators for STL containers are expected never
to return with an invalid pointer, and instead must throw on failure.
Violating this expectation can lead to invalid pointer dereferences.

Co-authored-by: johanngan <johanngan.us@gmail.com>

fixes github issue #317 (PR #320)
2022-08-29 15:03:18 +03:00
Chang, Harry
c1659b8544 Logical Combination: bypass combination flag in hs_expression_info.
Fixes github issue #291
2022-08-29 15:03:14 +03:00
Hong, Yang A
decabdfede update year for bugfix #302-#305 2022-08-29 15:03:11 +03:00
Hong, Yang A
a119693a66 mcclellan: improve wide-state checking in Sherman optimization
fixes github issue #305
2022-08-29 15:03:06 +03:00
Hong, Yang A
cafd5248b1 literal API: add instruction support
fixes github issue #303
2022-08-29 15:02:59 +03:00
Konstantinos Margaritis
6259783d79
Merge pull request #116 from pareenaverma/develop
Fixed the PCRE download location
2022-07-20 23:08:11 +03:00
Konstantinos Margaritis
19947f70d2
Merge pull request #113 from danlark1/develop
Optimize vectorscan for aarch64 by using shrn instruction
2022-07-20 16:41:33 +03:00
Ubuntu
b5e1384995 Fixed the PCRE download location 2022-07-20 13:26:52 +00:00
Danila Kutenin
db52ce6f08 Fix avx512 movemask call 2022-07-20 09:03:50 +01:00
Danila Kutenin
7e7f604f7d Fix ppc64el debug 2022-06-26 23:05:17 +00:00
Danila Kutenin
849846700a Minor fix 2022-06-26 23:02:02 +00:00
Danila Kutenin
8a49e20bcd Fix formatting of a couple files 2022-06-26 22:59:58 +00:00
Danila Kutenin
49eb18ee4f Optimize vectorscan for aarch64 by using shrn instruction
This optimization is based on the thread
https://twitter.com/Danlark1/status/1539344279268691970 and uses
shift right and narrow by 4 instruction https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/SHRN--SHRN2--Shift-Right-Narrow--immediate--

To achieve that, I needed to redesign a little movemask into comparemask
and have an additional step towards mask iteration. Our benchmarks
showed 10-15% improvement on average for long matches.
2022-06-26 22:55:45 +00:00
Konstantinos Margaritis
73695e419c
Merge pull request #108 from jth/cmake-python
CMake: Use non-deprecated method for finding python
2022-05-20 09:05:11 +03:00
Jan Henning
85a77e3eff Bump scripts to python3 2022-05-19 16:25:08 +02:00
Jan Henning
0a35a467e0 Use non-deprecated method of finding python 2022-05-19 10:20:17 +02:00
Konstantinos Margaritis
fc5059aa10
Update CMakeLists.txt 2022-05-05 12:14:53 +03:00
Konstantinos Margaritis
8739a6c2a7
Merge pull request #103 from VectorCamp/develop
Develop
2022-05-05 10:34:56 +03:00
Konstantinos Margaritis
6c24e61572
Update Jenkinsfile 2022-05-04 21:57:38 +03:00
Konstantinos Margaritis
59ffac5721
Update Jenkinsfile 2022-05-04 16:41:10 +03:00
Konstantinos Margaritis
2c78b770ea
Update Jenkinsfile 2022-05-04 16:30:22 +03:00
Konstantinos Margaritis
e71fb5cfeb
Merge pull request #105 from VectorCamp/bugfix/jenkins
fix large pipeline error
2022-05-04 16:27:22 +03:00
Konstantinos Margaritis
b3d7174a93 fix large pipeline error 2022-05-04 16:26:02 +03:00
Konstantinos Margaritis
fce10b53a0
Delete JenkinsFile 2022-05-04 16:14:19 +03:00
Konstantinos Margaritis
630f7b2360
Merge pull request #104 from VectorCamp/bugfix/jenkinsfile
add Jenkinsfile back to master branch
2022-05-04 16:04:00 +03:00
Konstantinos Margaritis
f441213d35 add Jenkinsfile back to master branch 2022-05-04 16:01:53 +03:00
Konstantinos Margaritis
76b2b4b423 add Jenkinsfile back to master branch 2022-04-19 11:36:25 +03:00
Konstantinos Margaritis
bd9113463d
Merge pull request #102 from danlark1/patch-2
Optimized and correct version of movemask128 for ARM
2022-04-18 20:56:26 +03:00
Daniel Kutenin
288491d6d9
Optimized and correct version of movemask128 for ARM
Closes #99

https://gcc.godbolt.org/z/cTjKqzcvn

Previous version was not correct because movemask thought of having bytes 0xFF. We can fully match the semantics + do it faster with USRA instructions.

Re-submission to a develop branch
2022-04-18 13:37:53 +01:00
Konstantinos Margaritis
edea9d12b1
Merge pull request #94 from a16bitsysop/fat_runtime
change FAT_RUNTIME to a normal option so it can be set to off
2022-04-18 11:08:29 +03:00
Konstantinos Margaritis
5fa22e68ba
Merge pull request #93 from danlark1/master
Fix all ASAN issues in vectorscan
2022-04-18 11:07:18 +03:00
Duncan Bellamy
b34aacdb94 move to original position 2022-02-22 19:21:18 +00:00
Duncan Bellamy
d626381ad0 change FAT_RUNTIME to a normal option so it can be set to off
fixes #89
2022-02-20 13:16:58 +00:00
Danila Kutenin
5f8729a085 Fix a couple of tests 2022-02-18 19:31:03 +00:00
Danila Kutenin
b3e88e480f Add sanitize options 2022-02-18 18:35:26 +00:00
Danila Kutenin
9af996b936 Fix all ASAN issues in vectorscan 2022-02-18 17:14:51 +00:00
Konstantinos Margaritis
2819dc3d1b
Merge pull request #90 from BigRedEye/vectorscan-master
Fix word boundary assertions under C++20
2022-02-08 09:20:34 +02:00
BigRedEye
6d6c291769
fix: Mark operator bool explicit 2022-02-08 00:22:23 +03:00
Konstantinos Margaritis
e6f856407e
Merge pull request #86 from VectorCamp/develop
New release 5.4.6
2022-01-21 12:25:40 +02:00
Konstantinos Margaritis
f9b6526ef8
Merge pull request #87 from VectorCamp/feature/move-debian-package-to-separate-branch
keep debian folder in a separate branch
2022-01-21 12:24:03 +02:00
Konstantinos Margaritis
666e1c455e keep debian folder in a separate branch 2022-01-21 12:07:25 +02:00
Konstantinos Margaritis
6cd6957a23
Merge pull request #85 from VectorCamp/feature/add-debian-package
Feature/add debian package
2022-01-21 10:12:04 +02:00
Konstantinos Margaritis
0949576693 change source format to native, as we include debian folder 2022-01-20 21:03:02 +02:00
Konstantinos Margaritis
2eaf6e5d31 fix description, remove sse4.2-support from b-depends 2022-01-20 21:02:46 +02:00
Konstantinos Margaritis
f5960c81d9 add ITP bug report 2022-01-20 21:02:30 +02:00
Konstantinos Margaritis
312ae895b4 add sse4.2-support package to enforce such dependency 2022-01-19 15:08:52 +02:00
Konstantinos Margaritis
4c32b36f53 remove preinst script, not needed as we bumped our deps 2022-01-19 15:08:04 +02:00
Konstantinos Margaritis
1155a9219c add our copyrights, minor fixes 2022-01-19 14:31:59 +02:00
Konstantinos Margaritis
f304c3e7e1 defer setting arch/tune flags for FAT_RUNTIME 2022-01-18 20:34:45 +02:00
Konstantinos Margaritis
4fdfb8c7f4 enable FAT_RUNTIME 2022-01-18 20:32:22 +02:00
Konstantinos Margaritis
a315fae243 fix DEB_CMAKE_FLAGS depending on DEB_HOST_ARCH 2021-12-22 13:25:29 +02:00
Konstantinos Margaritis
8c71238d60 Initial attempt at debian packaging, modified hyperscan packaging 2021-12-22 13:13:12 +02:00
Konstantinos Margaritis
90018f927f
Merge pull request #82 from VectorCamp/feature/add-macos-support
Minor changes to enable compilation on Mac M1
2021-12-12 01:13:14 +02:00
Konstantinos Margaritis
467db4a268 Minor changes to enable compilation on Mac M1 2021-12-11 15:43:55 +02:00
Konstantinos Margaritis
1718e33544
Merge pull request #81 from VectorCamp/feature/add-clang-support
Feature/add clang support
2021-12-07 22:16:38 +02:00
Konstantinos Margaritis
4589f1742e minor fixes 2021-12-07 08:49:59 +00:00
Konstantinos Margaritis
fd2eabd071 fix clang-release-arm compilation 2021-12-07 08:43:52 +00:00
Konstantinos Margaritis
fec557c1f9 fix wrong castings for NEON 2021-12-06 21:35:51 +00:00
Konstantinos Margaritis
deeb113977 lower gcc minver to 9 to enable building on Ubuntu 20 LTS 2021-12-06 21:35:37 +00:00
Konstantinos Margaritis
d3f0d8dd70 update Jenkinsfile for all configurations 2021-12-06 18:38:01 +00:00
Konstantinos Margaritis
1b6f37d626 fix typo 2021-12-06 20:33:37 +02:00
Konstantinos Margaritis
290eabbca0 fix compilation with clang and some incomplete/wrong implementations for arm this time 2021-12-06 18:22:58 +00:00
Konstantinos Margaritis
58bfe5423e use Jenkinsfile in git 2021-12-03 18:27:21 +02:00
Konstantinos Margaritis
07ce6d8e7f fix build failures with clang on x86, make sure compilation works on other Power as well 2021-12-03 16:24:58 +02:00
Konstantinos Margaritis
7cad514366 clang is more strict 2021-12-02 23:09:53 +02:00
Konstantinos Margaritis
6b364021d1 don't fail if mtune does not return a valid configuration 2021-12-02 23:09:34 +02:00
Konstantinos Margaritis
451d539f1d Power does not use -march 2021-12-02 18:01:26 +02:00
Konstantinos Margaritis
5aae719ecd fix build with clang, in particular VSX uses long long instead of int64_t, gcc allows this, clang does not 2021-12-02 18:01:00 +02:00
Konstantinos Margaritis
4aa32275f1 use same definition of the union for all types 2021-12-02 18:00:02 +02:00
Konstantinos Margaritis
5d23e6dab6 set -msse4.2 only on Intel 2021-12-01 21:45:31 +00:00
Konstantinos Margaritis
1f4143de81 rework CMakeLists.txt to ensure it works with clang 2021-12-01 23:23:37 +02:00
Konstantinos Margaritis
0221dc1771 fix misompilations with clang++, as it is more strict 2021-12-01 23:22:15 +02:00
Konstantinos Margaritis
7d600c4fcb bump base requirements to SSE4.2 2021-12-01 23:20:02 +02:00
Konstantinos Margaritis
404a0ab0f4 fix miscompilation with clang 2021-12-01 23:18:57 +02:00
Konstantinos Margaritis
6f20276b2f
Merge pull request #80 from VectorCamp/bugfix/fix-SVE2-build
fix SVE2 build after the changes
2021-11-25 22:19:12 +02:00
Konstantinos Margaritis
81fba99f3a fix SVE2 build after the changes 2021-11-25 18:48:24 +02:00
Konstantinos Margaritis
8dec4e8d85
Merge pull request #79 from Apostolos00tapsas/feature/complete-power9-VSX-support
Feature/complete power9 vsx support
2021-11-25 18:40:17 +02:00
Konstantinos Margaritis
7ceca78db4 fix unit-internal release builds using __builtin_constant_p() as well 2021-11-25 15:09:01 +02:00
Konstantinos Margaritis
00384c9e37 nit 2021-11-25 06:21:07 +00:00
Konstantinos Margaritis
cd95b1a38c use __builtin_constant_p() instead for arm as well 2021-11-25 06:20:53 +00:00
Apostolos Tapsas
725a8d8f1a Removed duplicates 2021-11-24 15:09:53 +00:00
Apostolos Tapsas
35e5369c70 *fix palignr implementation for VSX Release mode
*add unit test for palignr
*enable unit test building for Release mode
2021-11-24 15:03:49 +00:00
Apostolos Tapsas
bfc8da1102 Removed accidentaly included header file 2021-11-24 12:11:21 +00:00
Apostolos Tapsas
e13bfec734 found and solved very hard to track bug of intrinsic function palignr, that manifested only in Release builds and not Debug builds in a particular number of tests 2021-11-24 11:18:18 +00:00
Apostolos Tapsas
0287724413 WIP:tracking last bugs in failing tests for release build 2021-11-16 15:24:22 +00:00
Apostolos Tapsas
54158a1746 vermicelli and match implementations for ppc64el added 2021-11-13 19:36:46 +00:00
apostolos
e09d8674b4 resolving conficts after merging 2021-11-13 18:58:22 +02:00
Konstantinos Margaritis
7e7c50bdd7
Merge pull request #78 from VectorCamp/feature/refactor-vermicelli
Feature/refactor vermicelli
2021-11-12 23:06:46 +02:00
apostolos
4114b8a480 SuperVector opandnot test enriched 2021-11-10 15:12:25 +02:00
apostolos
942deb7d80 test for load m128 from u64a function added 2021-11-10 09:01:28 +02:00
Konstantinos Margaritis
41b98d7d8f add len parameter to arm matchers as well 2021-11-08 19:45:36 +00:00
Konstantinos Margaritis
dcf6b59e8d split vermicelli block implementations per arch 2021-11-08 19:45:21 +00:00
Apostolos Tapsas
82bea29f4e simd_utils functions fixed 2021-11-08 14:22:58 +00:00
Apostolos Tapsas
ba90cdeb5a SuperVector constructors as well as andnot implementation fixed 2021-11-05 13:34:48 +00:00
Konstantinos Margaritis
24fa54081b add len parameter and mask, fixes corner cases on AVX512 2021-11-05 14:30:22 +02:00
Konstantinos Margaritis
210295a702 remove vermicelli.h and replace it with vermicelli.hpp 2021-11-02 22:30:53 +02:00
Konstantinos Margaritis
869d2bd53b refactor vermicelliDoubleMaskedExec() 2021-11-02 22:30:21 +02:00
Konstantinos Margaritis
16f3cca98b add vermicelli.hpp to includes 2021-11-01 16:40:17 +00:00
Konstantinos Margaritis
59505f98ba remove vermicelli_sse.h 2021-11-01 16:40:01 +00:00
Konstantinos Margaritis
d55c74b6c4 fix arm matchers 2021-11-01 16:31:38 +00:00
Konstantinos Margaritis
f6fd845400 complete refactoring and unification of Vermicelli functions 2021-11-01 16:28:50 +00:00
Konstantinos Margaritis
d47641c2fc remove unneeded header 2021-11-01 16:28:50 +00:00
Konstantinos Margaritis
bc1a1127cf add new include file 2021-11-01 16:28:50 +00:00
Konstantinos Margaritis
5eabceddcf renamed matcher functions, added new ones for Vermicelli 2021-11-01 16:28:50 +00:00
Konstantinos Margaritis
16e5e2ae64 nits 2021-11-01 16:05:43 +00:00
Konstantinos Margaritis
713aaef799 move casemask helper functions to separate header 2021-11-01 16:05:43 +00:00
Konstantinos Margaritis
4a569affbc add to CMake 2021-11-01 16:05:43 +00:00
Konstantinos Margaritis
2fa947af9c added refactored vermicelli_simd.cpp implementation 2021-11-01 16:05:43 +00:00
Konstantinos Margaritis
9abfdcaa84 add Vermicelli/RVermicelli to microbenchmark utility 2021-11-01 16:53:21 +02:00
Konstantinos Margaritis
7b65b298c1 add arm vector types in union, avoid -flax-conversions, fix castings 2021-11-01 16:52:17 +02:00
Konstantinos Margaritis
44dc75a3ea complete refactoring and unification of Vermicelli functions 2021-11-01 16:51:18 +02:00
Konstantinos Margaritis
f4a490ac00 remove unneeded header 2021-11-01 16:50:38 +02:00
apostolos
d9d39d48c5 prints commants and formating fixes 2021-11-01 10:09:15 +02:00
Konstantinos Margaritis
dd45bf0d35 add new include file 2021-10-27 12:32:54 +03:00
Konstantinos Margaritis
8ae6e613cb renamed matcher functions, added new ones for Vermicelli 2021-10-27 12:32:03 +03:00
Konstantinos Margaritis
70414574ee nits 2021-10-27 12:31:04 +03:00
Konstantinos Margaritis
6e5a8353c5 move casemask helper functions to separate header 2021-10-27 12:30:42 +03:00
Konstantinos Margaritis
70ddb11a72 add to CMake 2021-10-27 12:29:59 +03:00
Konstantinos Margaritis
8be8ed309f added refactored vermicelli_simd.cpp implementation 2021-10-27 12:29:39 +03:00
apostolos
3f17750a27 nits 2021-10-26 11:55:02 +03:00
apostolos
bf54aae779 Special case for Shuffle test added as well as comments for respectives implementations 2021-10-26 11:48:33 +03:00
Apostolos Tapsas
1eb3b19f63 Shuffle simd and SuperVector implementetions as well as their test realy fixed 2021-10-25 09:19:30 +03:00
Apostolos Tapsas
d43d6733b6 SuperVector shuffle implementation and test function optimized 2021-10-22 11:55:39 +00:00
apostolos
57301721f1 print functions missing keywords replaced 2021-10-22 12:38:16 +03:00
apostolos
24f149f239 print functions keyword renamed 2021-10-22 12:36:07 +03:00
apostolos
b53b0a0fcd test for movemask and shuffle cases added 2021-10-22 11:17:43 +03:00
Apostolos Tapsas
5abda15c26 expand128 bugs fixed 2021-10-22 07:05:55 +00:00
apostolos
7184ce9870 expand128 implementation was changed to be like arm's 2021-10-22 09:46:04 +03:00
Apostolos Tapsas
2b1db73326 WIP: simd & bitutils files finctions fixes 2021-10-21 13:34:02 +00:00
Apostolos Tapsas
558313a2c2 SuperVector operators fixes and simd_utils low/high64 functions implementations added 2021-10-18 12:26:38 +00:00
Apostolos Tapsas
e084c2d6e4 SuperVector vsh* implementations 2021-10-15 14:07:17 +00:00
apostolos
b1f53f8e49 match file for ARCH_PPC64EL added 2021-10-14 16:26:59 +03:00
apostolos
ba4472a61c trufle and shufle implementations for ARCH_PPC64EL 2021-10-14 16:01:21 +03:00
apostolos
d0a41252c8 blockSigleMask implementations for ARCH_PPC64 added 2021-10-14 15:56:13 +03:00
apostolos
4d2acd59e2 Supervector vsh* added 2021-10-14 15:08:23 +03:00
Apostolos Tapsas
7888dd4418 WIP: Power VSX support almost completed 2021-10-14 13:53:55 +03:00
Vectorcamp
2231f7c024 compile fixes for vsc port 2021-10-14 13:53:55 +03:00
apostolos
90d3db1776 update powerpc simd util file functions 2021-10-14 13:53:55 +03:00
apostolos
0078c28ee6 implementations for powerpc64el architecture 2021-10-14 13:53:55 +03:00
Vectorcamp
079f3518d7 ppc64el arcitecture added in CMakelists file 2021-10-14 13:53:55 +03:00
Vectorcamp
f1d781ffee test commit from VM and CMakelists add power support 2021-10-14 13:53:55 +03:00
Konstantinos Margaritis
1f55d419eb add initial ppc64el support
(cherry picked from commit 63e26a4b2880eda7b6ac7b49271d83ba3e6143c4)
(cherry picked from commit c214ba253327114c16d0724f75c998ab00d44919)
2021-10-14 13:53:55 +03:00
Konstantinos Margaritis
35a25fffd7 link benchmarks against static lib only as some symbols are not exposed in the shared lib 2021-10-12 10:33:40 +00:00
Konstantinos Margaritis
4e044d4142 Add missing copyright info from tampered files 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
b9801478b2 bump version 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
c3baf3d296 fix multiple/undefined symbols when using fat runtimes 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
2d9f52d03e add arm truffle block function 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
9d0c15c448 add simd_onebit_masks as static in arm simd_utils.h as well 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
aea10b8ab0 simplify truffle and provide arch-specific block functions 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
623c64142b simplify shufti and provide arch-specific block functions 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
577e03e0c7 rearrange method declarations 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
9c54412447 remove simd_utils.c 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
8b7ba89cb5 add x86 vsh* implementations 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
eebd6c97bc use movemask 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
6ceab8435d add header define to avoid double inclusion 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
db6354b787 do not include the Supervector impl.cpp files in fat runtime 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
a78f3789a9 atm, do not built benchmark tool for fat runtime, as the function names are modified, need to rethink this 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
96af3e8613 Improve benchmarks 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
fad39b6058 optimize and simplify Shufti and Truffle to work with a single block method instead 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
456b1c6182 no need to convert to size_t 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
9e6c1c30cf remove asserts, as they are not needed 2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
fa3d509fad firstMatch/lastMatch are now arch-dependent, emulating movemask on non-Intel is very costly, the alternative is almost twice as fast on Arm 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
9ab18cf419 fix for new pshufb 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
67e0674df8 Changes/Additions to SuperVector class * added ==,!=,>=,>,<=,< operators * reworked shift operators to be more uniform and orthogonal, like Arm ISA * Added Unroller class to allow handling of multiple cases but avoid code duplication * pshufb method can now emulate Intel or not (avoids one instruction). 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e7161fdfec initial SSE/AVX2 implementation 2021-10-12 11:51:34 +03:00
Duncan Bellamy
e5e2057ca9 remove adding CMAKE_CXX_IMPLICIT_LINK_LIBRARIES to PRIVATE_LIBS
as on alpine linux this add gcc_s which is a shared library

on alpine:
Libs.private: -lstdc++ -lm -lssp_nonshared -lgcc_s -lgcc -lc -lgcc_s -lgcc
2021-10-12 11:51:34 +03:00
apostolos
bc57891aa0 Unify benchmarks, more accurate measurements 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
b40899966f Unify benchmarks, more accurate measurements
(cherry picked from commit f50d7656bc78c54ec25916b6c8e655c188d79a13)
2021-10-12 11:51:34 +03:00
apostolos
d7e9d2d915 benchmarks functions replaced with lambdas 2021-10-12 11:51:34 +03:00
apostolos
cf1d72745c raw pointers replaced with smart pointers 2021-10-12 11:51:34 +03:00
apostolos
c774a76f24 nit 2021-10-12 11:51:34 +03:00
apostolos
a86d6c290d nit 2021-10-12 11:51:34 +03:00
apostolos
ee8fa17351 fix benchmarks outputs 2021-10-12 11:51:34 +03:00
apostolos
53b9034546 bandwidth output fixes 2021-10-12 11:51:34 +03:00
apostolos
0e141ce700 size outup for case with match fixed 2021-10-12 11:51:34 +03:00
apostolos
5d4adf267d nits 2021-10-12 11:51:34 +03:00
apostolos
2e6c75c895 size output fixed 2021-10-12 11:51:34 +03:00
apostolos
9901477bcf nits 2021-10-12 11:51:34 +03:00
apostolos
2b9636ccc0 benchmarks output fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
91f58fb1ca add missing header 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
be1551aa94 remove confusing OPTIMISE flag 2021-10-12 11:51:34 +03:00
apostolos
4027319d6c nits 2021-10-12 11:51:34 +03:00
apostolos
1009391d9f code size reduction by using function arrays and add bandwidth to output 2021-10-12 11:51:34 +03:00
apostolos
904a94fbe5 micro-benchmarks for shufti, trufle and noodle added 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
08357a096c remove Windows/ICC support 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
8cff876962 fix lshift128 test 2021-10-12 11:51:34 +03:00
apostolos
67fa6d2738 alignr methods for avx2 and avx512 added 2021-10-12 11:51:34 +03:00
apostolos
b3a20afbbc limex_shuffle added and it's unit tests 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
de30471edd remove duplicate functions from previous merge 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e5050c9373 add missing compile flags 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
7f5e859019 add accidentally removed lines 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
deae90f947 * add -fno-new-ttp-matching to fix build-failures on newer gcc compilers with C++17
* add explicit -mssse3, -mavx2 in compiler flags in respective build profiles
2021-10-12 11:51:34 +03:00
George Wort
a879715953 Move SVE functions into their own files.
Change-Id: I995ba4b7d2b558ee403693ee45d747d414d3b177
2021-10-12 11:51:34 +03:00
George Wort
6c6aee9682 Implement new DoubleVermicelli16 acceleration functions using SVE2
Change-Id: Id4a8ffca840caab930a6e78cc0dfd0fe7d320b4e
2021-10-12 11:51:34 +03:00
George Wort
25183089fd Use SVE shufti for counting miracles.
Change-Id: Idd4aaf5bbc05fc90e9138c6fed385bc6ffa7b0b8
2021-10-12 11:51:34 +03:00
George Wort
00fff3f53c Use SVE for double shufti.
Change-Id: I09e0d57bb8a2f05b613f6225dea79ae823136268
2021-10-12 11:51:34 +03:00
George Wort
c95a4c3dd1 Use SVE for single shufti.
Change-Id: Ic76940c5bb9b81a1c45d39e9ca396a158c50a7dc
2021-10-12 11:51:34 +03:00
George Wort
56ef2d5f72 Use SVE2 for counting miracles.
Change-Id: I048dc182e5f4e726b847b3285ffafef4f538e550
2021-10-12 11:51:34 +03:00
George Wort
ab5d4d9279 Replace USE_ARM_SVE with HAVE_SVE.
Change-Id: I469efaac197cba93201f2ca6eca78ca61be3054d
2021-10-12 11:51:34 +03:00
George Wort
8242f46ed7 Add Licence to state_compress and bitutils.
Change-Id: I958daf82e5aef5bd306424dcfa7812382b266d65
2021-10-12 11:51:34 +03:00
George Wort
df926ef62f Implement new Vermicelli16 acceleration functions using SVE2.
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.

Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
c7086cb7f1 Add SVE2 support for dvermicelli
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
a38324a5a3 add arm rshift128/rshift128 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
603bc14cdd fix failing corner case, add pshufb_maskz() 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e35b88f2c8 use STL make_unique, remove wrapper header, breaks C++17 compilation 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f5f37f3f40 change C/C++ standard used to C17/C++17 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6f44a1aa26 remove low4bits from the arguments, fix cases that mostly affect loading large (64) vectors and falling out of bounds 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f2d9784979 fix loadu_maskz, add {l,r}shift128_var(), tab fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
a2e6143ea1 convert to for loops 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f8ce0bb922 minor fixes, add 2 constructors from half size vectors 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
cabd13d18a fix lastMatch<64> 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
ebb1b84ae3 provide an {l,r}shift128_var() to fix immediate value build failure in loadu_maskz 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
825460856f fix arm loadu_maskz() 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
86accf41a3 add arm rshift128/rshift128 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
b67cd7dfd0 use rshift128() instead of vector-wide right shift 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6c51f7f591 add {l,r}shift128()+tests, rename printv_u64() to print64() 2021-10-12 11:51:34 +03:00
George Wort
051ceed0f9 Use SVE2 Bitperm's bdep instruction in bitutils and state_compress
Specifically for pdep64, expand32, and expand64 in bitutils,
as well as all of the loadcompressed functions used in
state_compress.

Change-Id: I92851bd12481dbee6a7e344df0890c4901b56d01
2021-10-12 11:51:34 +03:00
George Wort
4bc28272da Fix CROSS_COMPILE_AARCH64 for SVE issues.
Change-Id: I7b9ba3ccb754d96eee22ca01714c783dae1e4956
2021-10-12 11:51:34 +03:00
George Wort
9fb79ac3ec Add SVE2 support for vermicelli
Change-Id: Ia025de53521fbaefe5fb1e4425aaf75c7d80a14e
2021-10-12 11:51:34 +03:00
George Wort
7162446358 Remove possibly undefined behaviour from Noodle.
Change-Id: I9a7997cea6a48927cb02b00c5dba5009bbf83850
2021-10-12 11:51:34 +03:00
George Wort
b48ea2c1a6 Remove first check from scanDouble Noodle.
Change-Id: I00eabb3cb06ef6a2060df52c26fa8591907a2711
2021-10-12 11:51:34 +03:00
apostolos
89b123d003 Equal mask test fixed with random numbers 2021-10-12 11:51:34 +03:00
apostolos
6f88ecac44 Supervector test fixes 2021-10-12 11:51:34 +03:00
apostolos
ae6bc52076 SuperVector AVX512 implementations 2021-10-12 11:51:34 +03:00
apostolos
32350cf9b1 SuperVector unit tests for AVX2 and AVX512 added 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
7ae636dfe9 really fix lshift for avx2 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
c44fa634d1 disable OPTIMISE by default 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
d04b899c29 fix truffle SIMD for S>16 as well 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
b42b187712 add AVX2 specializations 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
dede600637 lots of fixes to AVX2 implementation 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
c45e72775f convert print helper functions to class methods 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
78e098661f tiny change in vector initialization 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
d453a612dc fix last failing Shufti/Truffle tests 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
ec3f108d71 fix arm SuperVector implementation 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0ed10082b1 fix rtruffle, was failing Lbr and a few ReverseTruffle tests 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f425951b49 fix x86 debug alignr 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
845e533b66 move firstMatch, lastMatch to own header in util 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
41ff0962c4 minor fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6d8f3b9ff8 compilation fixes for debug mode 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
d7b247a949 fix arm implementation of alignr() 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
28b2949396 harmonise syntax of x86 SuperVector impl.cpp like arm, fix alignr, define printv_* functions when on debug mode only 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
9de3065e68 style fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e0a45a354d removed obsolete file 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2753dbb3b0 rename supervector class header, use dup_*() functions names instead of set1_*(), minor fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
9685095379 handle GNUCC_ARCH on non-x86 properly 2021-10-12 11:51:34 +03:00
apostolos
1ce5e17ce9 Truffle simd vectorized 2021-10-12 11:51:34 +03:00
George Wort
d1009e8830 Fix error in initial noodle double final call.
Change-Id: Ie044988f183b47e0b2f1eed3b4bd23de75c3117d
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5297ed5038 syntax fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
8b09ecfe48 nits 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
cceb599fc9 fix typo 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e49fa3a97a fix unit tests, and resp. ARM SuperVector methods based on those unit tests, add print functions for SuperVector 2021-10-12 11:51:34 +03:00
apostolos
1e434a9b3d Supervector Unit Tests 2021-10-12 11:51:34 +03:00
George Wort
d6df8116a5 Add SVE2 support for noodle
Change-Id: Iacb7d1f164bdd0ba50e2e13d26fe548cf9b45a6a
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
acca824dea add missing ARM SuperVector methods, some tests still fail, WIP 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5d9d958e74 disable SuperVector unit tests for now, until ARM support is included 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6fbd18183a rename arm impl.hpp to impl.cpp, add operator|() to SuperVector class 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
23b075cbd4 refactor shufti algorithm to use SuperVector class, WIP 2021-10-12 11:51:34 +03:00
George Wort
3ee7b75ee0 Add SVE, SVE2, and SVE2_BITPERM as targets
Change-Id: I5231e2eb0a31708a16c853dc83ea48db32e0b0a5
2021-10-12 11:51:34 +03:00
George Wort
b6c3ab723b Enable cross compilation to aarch64
Change-Id: Iafc8ac60926f5286990ce63a4ff4f8b6a7c46bef
2021-10-12 11:51:34 +03:00
apostolos
feb2d3ccf7 SuperVector unit tests 2021-10-12 11:51:34 +03:00
apostolos
096fb55faa unit tests for supervector 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6526df81e4 add more functions, move defines here, enable inlining of template specializations only when running optimized code 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
d8b5eb5d17 fix compilation on C++ 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
273b9683ac simplify function 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e215157a21 move definitions elsewhere 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
05c7c8e576 move SuperVector versions of noodleEngine scan functions to _simd.hpp file 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6e63aafbea add arm support for the new SuperVector class 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
c6406bebde simplify scanSingleMain() and scanDoubleMain() 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f77837130d delete separate implementations 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e6c1fa04ce add C++ template SIMD library (WIP) 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
ede2b18564 add generic SIMD implementation 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5213ef579d rename project, change to noodle_engine.cpp 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
7a9a2dd0dc convert to C++ 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2805ff038a revert to push_back() 2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
52661f35e8 add global definitions for CHUNKSIZE/VECTORSIZE, define HAVE_AVX512* only when BUILD_AVX512 is also enabled 2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
831091db9e fix typo 2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
556206f138 replace push_back by emplace_back where possible 2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
9f7088a9e0 use -O3 for C++ code as well, makes a difference 2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
48e9a17f0a merge with master 2021-10-12 11:51:20 +03:00
Konstantinos Margaritis
ec5531a6b1 minor optimizations 2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
d3ff893871 prefetch works best when addresses are 64-byte aligned 2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
521f233cfd Revert "replace long macro and switch statement with function pointer array and branchless execution"
This reverts commit cc9dfed2494d709aac79051c29adb0a563903ba9.
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
92916e311f replace long macro and switch statement with function pointer array and branchless execution 2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
58cface115 optimise case handling 2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
e3e101b412 simplify and make scanSingle*()/scanDouble*() more uniform 2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
2f13ad0674 optimize caseMask handling 2021-10-12 11:50:32 +03:00
vectorcamp-jenkins
a0abf31a82 added basic Jenkinsfile 2021-04-13 22:52:42 +03:00
Konstantinos Margaritis
f2354537ff change project name in CMakeLists 2021-04-12 21:06:28 +03:00
Robbie Williamson
76bd21e521 Update README.md
Softened some of the wording around the reason for the fork. ;-)
2021-04-01 18:42:38 +03:00
Konstantinos Margaritis
5298333c73 bump version 2021-02-15 20:19:09 +02:00
Konstantinos Margaritis
27bd09454f use correct function names for AVX512, fix build failure 2021-02-15 13:54:19 +02:00
Konstantinos Margaritis
741d8246c5 fix some AVX512 function names, to fix AVX512 build failure, also rename the expand* functions to broadcast*() ones for consistency 2021-02-15 13:54:19 +02:00
Konstantinos Margaritis
c3c68b1c3f fix x86 implementations for compress128/expand128 2021-02-15 13:54:19 +02:00
Konstantinos Margaritis
e21305aa23 align array 2021-02-15 13:54:19 +02:00
Konstantinos Margaritis
04567ab649 use correct include 2021-02-15 13:54:19 +02:00
Konstantinos Margaritis
814045201f add BUILD_AVX2 definition, enable non-AVX2 building selectively 2021-02-15 13:54:19 +02:00
Konstantinos Margaritis
c078d355b6 Merge branch 'develop' 2021-02-15 13:44:30 +02:00
Konstantinos Margaritis
9fd94e0062 use unaligned loads for short scans 2021-02-11 14:21:57 +02:00
Konstantinos Margaritis
d3e03ed88a optimize case mask AND out of the loop 2021-02-10 13:29:45 +02:00
Konstantinos Margaritis
be66cdb51d fixes in shifting primitives 2021-02-08 19:38:20 +02:00
Konstantinos Margaritis
f541f75400 bugfix compress128/expand128, add unit tests 2021-02-08 19:20:37 +02:00
Konstantinos Margaritis
d9874898c7 make const 2021-02-08 19:19:52 +02:00
Konstantinos Margaritis
70c54ef144 Merge branch 'develop' of github.com:VectorCamp/vectorscan into develop 2021-01-26 18:22:28 +02:00
Konstantinos Margaritis
4cc93f5553 add necessary copyright info 2021-01-25 15:42:18 +02:00
Konstantinos Margaritis
dfd39fadb0 add links to Intel PRs 2021-01-25 15:29:41 +02:00
Konstantinos Margaritis
d8cece7cd2 modify README with name change 2021-01-25 15:27:50 +02:00
Wang Xiang W
6a8a7a6c01 Bump version number for release 2021-01-25 14:13:13 +02:00
Wang Xiang W
6377a73b2b changelog: updates for 5.4.0 release 2021-01-25 14:13:13 +02:00
Chang, Harry
52f658ac55 Fix Klocwork scan issues. 2021-01-25 14:13:13 +02:00
Wang Xiang W
18f6aee5c2 chimera: fix return value handling
Fixes github issue #270
2021-01-25 14:13:13 +02:00
Wang Xiang W
5f930b267c Limex: exception handling with AVX512 2021-01-25 14:13:13 +02:00
Chang, Harry
001b7824d2 Logical Combination: use hs_misc_free instead of free.
fixes github issue #284
2021-01-25 14:13:13 +02:00
Hong, Yang A
bb9ed60489 examples: add cmake enabling option BUILD_EXAMPLES. 2021-01-25 14:13:13 +02:00
Piotr Skamruk
6fd77679d9 [dev-reference] Fix minor typo in docs 2021-01-25 14:13:13 +02:00
Walt Stoneburner
345446519b Fixed several typos
Fixed spellings of regular, interpretation, and grammar to improve readability.

Fixes github issue #242
2021-01-25 14:13:13 +02:00
Wang Xiang W
beaca7c7db Adjust sensitive terms 2021-01-25 14:13:13 +02:00
Wang Xiang W
9ea1e4be3d limex: add fast NFA check 2021-01-25 14:13:13 +02:00
Chang, Harry
5ad3d64b4b Discard HAVE_AVX512VBMI checks at Sheng/McSheng compile time. 2021-01-25 14:13:13 +02:00
Chang, Harry
b19a41528a Add cpu feature / target info "AVX512VBMI". 2021-01-25 14:13:13 +02:00
Zhu,Wenjun
d96f1ab505 MCSHENG64: extend to 64-state based on mcsheng 2021-01-25 14:13:13 +02:00
Hong, Yang A
dea7c4dc2e lookaround:
add 64x8 and 64x16 shufti models
add mask64 model
expand entry quantity
2021-01-25 14:13:13 +02:00
Chang, Harry
56cb107005 AVX512VBMI Fat Teddy. 2021-01-25 14:13:13 +02:00
Chang, Harry
f5657ef7b7 Fix find_vertices_in_cycles(): don't check self-loop in SCC. 2021-01-25 14:13:13 +02:00
Chang, Harry
83d03e97c5 Fix cmake error on ICX under release mode. 2021-01-25 14:13:13 +02:00
Chang, Harry
a388a0f193 Fix sheng64 dump compile issue in clang. 2021-01-25 14:13:13 +02:00
Chang, Harry
c41d33c53f Fix sheng64 compile issue in clang and in DEBUG_OUTPUT mode on SKX. 2021-01-25 14:13:13 +02:00
Chang, Harry
ed4b0f713a SHENG64: 64-state 1-byte shuffle based DFA. 2021-01-25 14:13:13 +02:00
Chang, Harry
6a42b37fca SHENG32: Compile priority sheng > mcsheng > sheng32. 2021-01-25 14:13:13 +02:00
Chang, Harry
cc747013c4 SHENG32: 32-state 1-byte shuffle based DFA. 2021-01-25 14:13:13 +02:00
Hong, Yang A
d71515be04 DFA: use sherman economically 2021-01-25 14:13:13 +02:00
Wang Xiang W
7d21fc157c hsbench: add CSV dump support 2021-01-25 14:13:13 +02:00
Konstantinos Margaritis
87413fbff0 optimize get_conf_stride_1() 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
e2f253d8ab remove loads from movemask128, variable_byte_shift, add palignr_imm(), minor fixes 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
a039089888 fix non-const char * write-strings compile error 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
4686ac47b6 replace andn() by explicit bitops and group loads/stores, gives ~1% gain 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
b62247a36e borrow cache prefetching tricks from the Marvell port, seem to improve performance by 5-28% 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
51dcfa8571 fix compilation on non-x86 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
5b85589274 add some useful intrinsics 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
1c581e45e9 add expand128() implementation for NEON 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
c238d627c9 optimize get_conf_stride_1() 2021-01-22 10:13:55 +02:00
Konstantinos Margaritis
f9ef98ce19 remove loads from movemask128, variable_byte_shift, add palignr_imm(), minor fixes 2021-01-22 10:13:19 +02:00
Konstantinos Margaritis
dfba9227e9 fix non-const char * write-strings compile error 2021-01-22 10:11:20 +02:00
Konstantinos Margaritis
9bf5cac782 replace andn() by explicit bitops and group loads/stores, gives ~1% gain 2021-01-18 13:15:15 +02:00
Konstantinos Margaritis
94739756b4 borrow cache prefetching tricks from the Marvell port, seem to improve performance by 5-28% 2021-01-15 17:42:11 +02:00
Konstantinos Margaritis
fc4338eca0 fix compilation on non-x86 2021-01-15 17:35:21 +02:00
Konstantinos Margaritis
ef9bf02d00 add some useful intrinsics 2021-01-15 17:35:01 +02:00
Konstantinos Margaritis
6a11c83630 add expand128() implementation for NEON 2021-01-15 17:33:41 +02:00
Konstantinos Margaritis
644aac5e1b
Merge pull request #5 from VectorCamp/bugfix/fix-ia32-build
fix IA32 build, as we need minimum SSSE3 support for compilation to s…
2020-12-31 09:50:35 +02:00
Konstantinos Margaritis
752a42419b fix IA32 build, as we need minimum SSSE3 support for compilation to succeed 2020-12-30 19:57:44 +02:00
Konstantinos Margaritis
124455a4a8
Merge pull request #2 from VectorCamp/develop
Develop
2020-12-21 20:50:27 +02:00
Konstantinos Margaritis
0372a8120a
Merge pull request #1 from VectorCamp/feature/add-arm-support
Feature/add arm support
2020-12-16 19:01:32 +02:00
Konstantinos Margaritis
61b963a717 fix x86 compilation 2020-12-08 11:42:30 +02:00
Konstantinos Margaritis
e088c6ae2b remove forgotten printf 2020-12-07 23:12:41 +02:00
Konstantinos Margaritis
773dc6fa69 optimize *shiftbyte_m128() functions to use palign instead of variable_byte_shift_m128() 2020-12-07 23:12:26 +02:00
Konstantinos Margaritis
39945b7775 clear zones array 2020-12-03 19:30:50 +02:00
Konstantinos Margaritis
c38722a68b add ARM platform 2020-12-03 19:27:58 +02:00
Konstantinos Margaritis
38477b08bc fix movq and load_m128_from_u64a and resp. test for NEON 2020-12-03 19:27:38 +02:00
Konstantinos Margaritis
259c2572c1 define debug vector print functions to NULL in non-debug mode 2020-12-03 19:27:05 +02:00
Konstantinos Margaritis
17ab42d891 small optimization that was for some reason failing in ARM, should be faster anyway 2020-11-24 17:59:42 +02:00
Konstantinos Margaritis
d76365240b helper functions to print a m128 vector in debug mode 2020-11-24 17:57:16 +02:00
Konstantinos Margaritis
1c26f044a7 when building in debug mode, vgetq_lane_*() and vextq_*() need immediate operands, and we have to use switch()'ed versions 2020-11-24 17:56:40 +02:00
Konstantinos Margaritis
606c53a05f fix compiler flag testcase 2020-11-24 17:55:03 +02:00
Konstantinos Margaritis
c4f1372814 remove debug from functions 2020-11-05 20:33:17 +02:00
Konstantinos Margaritis
62fed20ad0 add some debug and minor optimizations in unit test 2020-11-05 19:21:16 +02:00
Konstantinos Margaritis
501f60e930 add some debug info 2020-11-05 19:20:37 +02:00
Konstantinos Margaritis
33904180d8 add compress128 function and implementation 2020-11-05 19:20:06 +02:00
Konstantinos Margaritis
7b8cf97546 add extra instructions (currently arm-only), fix order of elements in set4x32/set2x64 2020-11-05 19:18:53 +02:00
Konstantinos Margaritis
18296eee47 fix 32-bit/64-bit detection 2020-11-05 17:31:20 +02:00
Konstantinos Margaritis
592b1905af needed for ARM vector type conversions 2020-10-30 10:50:24 +02:00
Konstantinos Margaritis
547f79b920 small optimization in storecompress*() 2020-10-30 10:49:50 +02:00
Konstantinos Margaritis
548242981d fix ARM implementations 2020-10-30 10:38:41 +02:00
Konstantinos Margaritis
0bef151437 don't use SSE directly in the tests 2020-10-30 10:38:05 +02:00
Konstantinos Margaritis
149ea938c4 don't redefine function on x86 2020-10-16 13:09:08 +03:00
Konstantinos Margaritis
c4db63665a scalar implementations of diffrich256 and diffrich384 2020-10-16 13:02:40 +03:00
Konstantinos Margaritis
4bce012570 Revert "move x86 popcount.h implementations to util/arch/x86/popcount.h"
This reverts commit 6581aae90e55520353c03edb716de80ecc03521a.
2020-10-16 12:32:44 +03:00
Konstantinos Margaritis
83977db7ab split arch-agnostic simd_utils.h functions into the common file 2020-10-16 12:30:34 +03:00
Konstantinos Margaritis
e7e1308d7f fix compilation paths for cpuid_flags for x86 2020-10-16 12:29:45 +03:00
Konstantinos Margaritis
45bfed9b9d add scalar versions of the vectorized functions for architectures that don't support 256-bit/512-bit SIMD vectors such as ARM 2020-10-15 16:30:18 +03:00
Konstantinos Margaritis
c5a7f4b846 add ARM simd_utils vectorized functions for 128-bit vectors 2020-10-15 16:26:49 +03:00
Konstantinos Margaritis
5b425bd5a6 add arm simple cpuid_flags 2020-10-15 16:26:04 +03:00
Konstantinos Margaritis
31ac6718dd add ARM version of simd_utils.h 2020-10-13 09:19:56 +03:00
Konstantinos Margaritis
a9212174ee add arm bitutils.h header 2020-10-08 20:50:55 +03:00
Konstantinos Margaritis
1c2c73becf add C implementation of pdep64() 2020-10-08 20:50:18 +03:00
Konstantinos Margaritis
d2cf1a7882 move cpuid_flags.h header to common 2020-10-08 20:49:33 +03:00
Konstantinos Margaritis
5d773dd9db use C implementation of popcount for arm 2020-10-07 14:28:45 +03:00
Konstantinos Margaritis
4c924cc920 add arm architecture basic defines 2020-10-07 14:28:12 +03:00
Konstantinos Margaritis
9a0494259e minor fix 2020-10-07 14:26:41 +03:00
Konstantinos Margaritis
e91082d477 use right intrinsic 2020-10-06 13:45:52 +03:00
Konstantinos Margaritis
5952c64066 add necessary modifications to CMake system to enable building on ARM, add arm_neon.h intrinsic header to intrinsics.h 2020-10-06 12:44:23 +03:00
Konstantinos Margaritis
b1170bcc2e add arm checks in platform.cmake 2020-10-06 08:09:18 +03:00
Konstantinos Margaritis
f0e70bc0ad Revert "Revert "move x86 popcount.h implementations to util/arch/x86/popcount.h""
This reverts commit 04fbf2468140cc4d7ccabc62a2bdc4503a3d31c5.
2020-09-24 11:52:59 +03:00
Konstantinos Margaritis
04fbf24681 Revert "move x86 popcount.h implementations to util/arch/x86/popcount.h"
This reverts commit 6581aae90e55520353c03edb716de80ecc03521a.
2020-09-24 10:01:50 +03:00
Konstantinos Margaritis
5333467249 fix names, use own intrinsic instead of explicit _mm* ones 2020-09-23 11:51:21 +03:00
Konstantinos Margaritis
f7a6b8934c add some set*() functions, harmonize names, rename setAxB to set1_AxB when using mm_set1_* internally 2020-09-23 11:49:26 +03:00
Konstantinos Margaritis
e8e188acaf move x86 implementations of simd_utils.h to util/arch/x86/ 2020-09-22 13:12:07 +03:00
Konstantinos Margaritis
e915d84864 no need to check for WIN32* 2020-09-22 13:10:52 +03:00
Konstantinos Margaritis
9f3ad89ed6 move andn helper function to bitutils.h 2020-09-22 12:17:27 +03:00
Konstantinos Margaritis
6581aae90e move x86 popcount.h implementations to util/arch/x86/popcount.h 2020-09-22 11:45:24 +03:00
Konstantinos Margaritis
aac1f0f1dc move x86 bitutils.h implementations to util/arch/x86/bitutils.h 2020-09-22 11:02:07 +03:00
Konstantinos Margaritis
8ed5f4ac75 fix include paths for masked_move 2020-09-18 12:55:57 +03:00
Konstantinos Margaritis
956b001613 move masked_move* AVX2 implementation to util/arch/x86 2020-09-18 12:51:39 +03:00
Konstantinos Margaritis
ea721c908f move crc32 SSE42 implementation to util/arch/x86 2020-09-18 12:48:14 +03:00
Konstantinos Margaritis
6a40793719 move cpuid stuff to util/arch/x86 2020-09-17 20:35:39 +03:00
Konstantinos Margaritis
2d89df44ae move x86 arch and SIMD types to x86 arch folder 2020-09-17 19:00:48 +03:00
515 changed files with 1422667 additions and 13606 deletions

11
.clang-tidy Normal file
View File

@ -0,0 +1,11 @@
#unit/gtest/gtest-all.cc,build/src/parser/Parser.cpp,build/src/parser/control_verbs.cpp
#Dont change first comment ignores specific files from clang-tidy
Checks: 'clang-analyzer-*,-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,performance-*,-performance-unnecessary-value-param,-performance-avoid-endl'
WarningsAsErrors: ''
HeaderFilterRegex: '.*'
SystemHeaders: false
FormatStyle: none
InheritParentConfig: true
User: user

3
.gitmodules vendored Normal file
View File

@ -0,0 +1,3 @@
[submodule "simde"]
path = simde
url = https://github.com/simd-everywhere/simde.git

66
CHANGELOG-vectorscan.md Normal file
View File

@ -0,0 +1,66 @@
# Vectorscan Change Log
This is a list of notable changes to Vectorscan, in reverse chronological order. For Hyperscan Changelog, check CHANGELOG.md
## [5.4.11] 2023-11-19
- Refactor CMake build system to be much more modular.
- version in hs.h fell out of sync again #175
- Fix compile failures with recent compilers, namely clang-15 and gcc-13
- Fix clang 15,16 compilation errors on all platforms, refactor CMake build system #181
- Fix signed/unsigned char issue on Arm with Ragel generated code.
- Correct set_source_files_properties usage #189
- Fix build failure on Ubuntu 20.04
- Support building on Ubuntu 20.04 #180
- Require pkg-config during Cmake
- make pkgconfig a requirement #188
- Fix segfault on Fat runtimes with SVE2 code
- Move VERM16 enums to the end of the list #191
- Update README.md, add CHANGELOG-vectorscan.md and Contributors-vectorscan.md files
## [5.4.10] 2023-09-23
- Fix compilation with libcxx 16 by @rschu1ze in #144
- Fix use-of-uninitialized-value due to getData128() by @azat in #148
- Use std::vector instead of boost::container::small_vector under MSan by @azat in #149
- Feature/enable fat runtime arm by @markos in #165
- adding ifndef around HS_PUBLIC_API definition so that vectorscan can be statically linked into another shared library without exporting symbols by @jeffplaisance in #164
- Feature/backport hyperscan 2023 q3 by @markos in #169
- Prepare for 5.4.10 by @markos in #167
## [5.4.9] 2023-03-23
- Major change: Enable SVE & SVE2 builds and make it a supported architecture! (thanks to @abondarev84)
- Fix various clang-related bugs
- Fix Aarch64 bug in Parser.rl because of char signedness. Make unsigned char the default in the Parser for all architectures.
- Fix Power bug, multiple tests were failing.
- C++20 related change, use prefixed assume_aligned to avoid conflict with C++20 std::assume_aligned.
## [5.4.8] 2022-09-13
- CMake: Use non-deprecated method for finding python by @jth in #108
- Optimize vectorscan for aarch64 by using shrn instruction by @danlark1 in #113
- Fixed the PCRE download location by @pareenaverma in #116
- Bugfix/hyperscan backport 202208 by @markos in #118
- VSX optimizations by @markos in #119
- when compiling with mingw64, use __mingw_aligned_malloc() and __mingw_aligned_free() by @liquidaty in #121
- [NEON] simplify/optimize shift/align primitives by @markos in #123
- Merge develop to master by @markos in #124
## [5.4.7] 2022-05-05
- Fix word boundary assertions under C++20 by @BigRedEye in #90
- Fix all ASAN issues in vectorscan by @danlark1 in #93
- change FAT_RUNTIME to a normal option so it can be set to off by @a16bitsysop in #94
- Optimized and correct version of movemask128 for ARM by @danlark1 in #102
## [5.4.6] 2022-01-21
- Major refactoring of many engines to use internal SuperVector C++ templates library. Code size reduced to 1/3rd with no loss of performance in most cases.
- Microbenchmarking tool added for performance finetuning
- Arm Advanced SIMD/NEON fully ported. Initial work on SVE2 for a couple of engines.
- Power9 VSX ppc64le fully ported. Initial port needs some optimization.
- Clang compiler support added.
- Apple M1 support added.
- CI added, the following configurations are tested on every PR:
gcc-debug, gcc-release, clang-debug, clang-release:
Linux Intel: SSE4.2, AVX2, AVX512, FAT
Linux Arm
Linux Power9
clang-debug, clang-release:
MacOS Apple M1

View File

@ -2,6 +2,55 @@
This is a list of notable changes to Hyperscan, in reverse chronological order.
## [5.4.2] 2023-04-19
- Roll back bugfix for github issue #350: Besides using scratch for
corresponding database, Hyperscan also allows user to use larger scratch
allocated for another database. Users can leverage this property to achieve
safe scratch usage in multi-database scenarios. Behaviors beyond these are
discouraged and results are undefined.
- Fix hsdump issue due to invalid nfa type.
## [5.4.1] 2023-02-20
- The Intel Hyperscan team is pleased to provide a bug fix release to our open source library.
Intel also maintains an upgraded version available through your Intel sales representative.
- Bugfix for issue #184: fix random char value of UTF-8.
- Bugfix for issue #291: bypass logical combination flag in hs_expression_info().
- Bugfix for issue #292: fix build error due to libc symbol parsing.
- Bugfix for issue #302/304: add empty string check for pure literal API.
- Bugfix for issue #303: fix unknown instruction error in pure literal API.
- Bugfix for issue #303: avoid memory leak in stream close stage.
- Bugfix for issue #305: fix assertion failure in DFA construction.
- Bugfix for issue #317: fix aligned allocator segment faults.
- Bugfix for issue #350: add quick validity check for scratch.
- Bugfix for issue #359: fix glibc-2.34 stack size issue.
- Bugfix for issue #360: fix SKIP flag issue in chimera.
- Bugfix for issue #362: fix one cotec check corner issue in UTF-8 validation.
- Fix other compile issues.
## [5.4.0] 2020-12-31
- Improvement on literal matcher "Fat Teddy" performance, including
support for Intel(R) AVX-512 Vector Byte Manipulation Instructions (Intel(R)
AVX-512 VBMI).
- Introduce a new 32-state shuffle-based DFA engine ("Sheng32"). This improves
scanning performance by leveraging AVX-512 VBMI.
- Introduce a new 64-state shuffle-based DFA engine ("Sheng64"). This improves
scanning performance by leveraging AVX-512 VBMI.
- Introduce a new shuffle-based hybrid DFA engine ("McSheng64"). This improves
scanning performance by leveraging AVX-512 VBMI.
- Improvement on exceptional state handling performance for LimEx NFA, including
support for AVX-512 VBMI.
- Improvement on lookaround performance with new models, including support for
AVX-512.
- Improvement on DFA state space efficiency.
- Optimization on decision of NFA/DFA generation.
- hsbench: add CSV dump support for hsbench.
- Bugfix for cmake error on Icelake under release mode.
- Bugfix in find_vertices_in_cycles() to avoid self-loop checking in SCC.
- Bugfix for issue #270: fix return value handling in chimera.
- Bugfix for issue #284: use correct free function in logical combination.
- Add BUILD_EXAMPLES cmake option to enable example code compilation. (#260)
- Some typo fixing. (#242, #259)
## [5.3.0] 2020-05-15
- Improvement on literal matcher "Teddy" performance, including support for
Intel(R) AVX-512 Vector Byte Manipulation Instructions (Intel(R) AVX-512

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,5 @@
Copyright (c) 2015, Intel Corporation
Copyright (c) 2019-20, VectorCamp PC
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

View File

@ -0,0 +1,25 @@
394 Konstantinos Margaritis <konstantinos@vectorcamp.gr>
59 apostolos <apostolos.tapsas@vectorcamp.gr>
25 Hong, Yang A <yang.a.hong@intel.com>
19 George Wort <george.wort@arm.com>
16 Chang, Harry <harry.chang@intel.com>
7 Danila Kutenin <danilak@google.com>
7 Wang Xiang W <xiang.w.wang@intel.com>
6 Alex Bondarev <abondarev84@gmail.com>
5 Konstantinos Margaritis <konma@vectorcamp.gr>
3 Duncan Bellamy <dunk@denkimushi.com>
2 Azat Khuzhin <a3at.mail@gmail.com>
2 Jan Henning <jan.thilo.henning@sap.com>
1 BigRedEye <mail@bigredeye.me>
1 Daniel Kutenin <kutdanila@yandex.ru>
1 Danila Kutenin <kutdanila@yandex.ru>
1 Liu Zixian <hdu_sdlzx@163.com>
1 Mitchell Wasson <miwasson@cisco.com>
1 Piotr Skamruk <piotr.skamruk@gmail.com>
1 Robbie Williamson <robbie.williamson@arm.com>
1 Robert Schulze <robert@clickhouse.com>
1 Walt Stoneburner <wls@wwco.com>
1 Zhu,Wenjun <wenjun.zhu@intel.com>
1 hongyang7 <yang.a.hong@intel.com>
1 jplaisance <jeffplaisance@gmail.com>
1 liquidaty <info@liquidaty.com>

View File

@ -2,6 +2,11 @@ Hyperscan is licensed under the BSD License.
Copyright (c) 2015, Intel Corporation
Vectorscan is licensed under the BSD License.
Copyright (c) 2020, VectorCamp PC
Copyright (c) 2021, Arm Limited
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

251
README.md
View File

@ -1,43 +1,252 @@
# Hyperscan
# About Vectorscan
Hyperscan is a high-performance multiple regex matching library. It follows the
A fork of Intel's Hyperscan, modified to run on more platforms. Currently ARM NEON/ASIMD
and Power VSX are 100% functional. ARM SVE2 support is in ongoing with
access to hardware now. More platforms will follow in the future.
Further more, starting 5.4.12 there is now a [SIMDe](https://github.com/simd-everywhere/simde)
port, which can be either used for platforms without official SIMD support,
as SIMDe can emulate SIMD instructions, or as an alternative backend for existing architectures,
for reference and comparison purposes.
Vectorscan will follow Intel's API and internal algorithms where possible, but will not
hesitate to make code changes where it is thought of giving better performance or better
portability. In addition, the code will be gradually simplified and made more uniform and
all architecture specific -currently Intel- #ifdefs will be removed and abstracted away.
# Why was there a need for a fork?
Originally, the ARM porting was intended to be merged into Intel's own Hyperscan, and relevant
Pull Requests were made to the project for this reason. Unfortunately, the
PRs were rejected for now and the forseeable future, thus we have created Vectorscan for
our own multi-architectural and opensource collaborative needs.
The recent license change of Hyperscan makes Vectorscan even more relevant for the FLOSS ecosystem.
# What is Vectorscan/Hyperscan/?
Hyperscan and by extension Vectorscan is a high-performance multiple regex matching library. It follows the
regular expression syntax of the commonly-used libpcre library, but is a
standalone library with its own C API.
Hyperscan uses hybrid automata techniques to allow simultaneous matching of
Hyperscan/Vectorscan uses hybrid automata techniques to allow simultaneous matching of
large numbers (up to tens of thousands) of regular expressions and for the
matching of regular expressions across streams of data.
Hyperscan is typically used in a DPI library stack.
# Documentation
Information on building the Hyperscan library and using its API is available in
the [Developer Reference Guide](http://intel.github.io/hyperscan/dev-reference/).
Vectorscan is typically used in a DPI library stack, just like Hyperscan.
# License
Hyperscan is licensed under the BSD License. See the LICENSE file in the
project repository.
Vectorscan follows a BSD License like the original Hyperscan (up to 5.4).
Vectorscan continues to be an open source project and we are committed to keep it that way.
See the LICENSE file in the project repository.
## Hyperscan License Change after 5.4
According to
[Accelerate Snort Performance with Hyperscan and Intel Xeon Processors on Public Clouds](https://networkbuilders.intel.com/docs/networkbuilders/accelerate-snort-performance-with-hyperscan-and-intel-xeon-processors-on-public-clouds-1680176363.pdf) versions of Hyperscan later than 5.4 are
going to be closed-source:
> The latest open-source version (BSD-3 license) of Hyperscan on Github is 5.4. Intel conducts continuous internal
> development and delivers new Hyperscan releases under Intel Proprietary License (IPL) beginning from 5.5 for interested
> customers. Please contact authors to learn more about getting new Hyperscan releases.
# Versioning
The `master` branch on Github will always contain the most recent release of
The `master` branch on Github will always contain the most recent stable release of
Hyperscan. Each version released to `master` goes through QA and testing before
it is released; if you're a user, rather than a developer, this is the version
you should be using.
Further development towards the next release takes place on the `develop`
branch.
branch. All PRs are first made against the develop branch and if the pass the [Vectorscan CI](https://buildbot-ci.vectorcamp.gr/#/grid), then they get merged. Similarly with PRs from develop to master.
# Get Involved
# Compatibility with Hyperscan
The official homepage for Hyperscan is at [www.hyperscan.io](https://www.hyperscan.io).
Vectorscan aims to be ABI and API compatible with the last open source version of Intel Hyperscan 5.4.
After careful consideration we decided that we will **NOT** aim to achieving compatibility with later Hyperscan versions 5.5/5.6 that have extended Hyperscan's API.
If keeping up to date with latest API of Hyperscan, you should talk to Intel and get a license to use that.
However, we intend to extend Vectorscan's API with user requested changes or API extensions and improvements that we think are best for the project.
If you have questions or comments, we encourage you to [join the mailing
list](https://lists.01.org/mailman/listinfo/hyperscan). Bugs can be filed by
sending email to the list, or by creating an issue on Github.
# Installation
If you wish to contact the Hyperscan team at Intel directly, without posting
publicly to the mailing list, send email to
[hyperscan@intel.com](mailto:hyperscan@intel.com).
## Debian/Ubuntu
On recent Debian/Ubuntu systems, vectorscan should be directly available for installation:
```
$ sudo apt install libvectorscan5
```
Or to install the devel package you can install `libvectorscan-dev` package:
```
$ sudo apt install libvectorscan-dev
```
For other distributions/OSes please check the [Wiki](https://github.com/VectorCamp/vectorscan/wiki/Installation-from-package)
# Build Instructions
The build system has recently been refactored to be more modular and easier to extend. For that reason,
some small but necessary changes were made that might break compatibility with how Hyperscan was built.
## Install Common Dependencies
### Debian/Ubuntu
In order to build on Debian/Ubuntu make sure you install the following build-dependencies
```
$ sudo apt install build-essential cmake ragel pkg-config libsqlite3-dev libpcap-dev
```
### Other distributions
TBD
### MacOS X (M1/M2/M3 CPUs only)
Assuming an existing HomeBrew installation:
```
% brew install boost cmake gcc libpcap pkg-config ragel sqlite
```
### *BSD
In NetBSD you will almost certainly need to have a newer compiler installed.
Also you will need to install cmake, sqlite, boost and ragel.
Also, libpcap is necessary for some of the benchmarks, so let's install that
as well.
When using pkgsrc, you would typically do this using something
similar to
```
pkg_add gcc12-12.3.0.tgz
pkg_add boost-headers-1.83.0.tgz boost-jam-1.83.0.tgz boost-libs-1.83.0nb1.tgz
pkg_add ragel-6.10.tgz
pkg_add cmake-3.28.1.tgz
pkg_add sqlite3-3.44.2.tgz
pkg_add libpcap-1.10.4.tgz
```
Version numbers etc will of course vary. One would either download the
binary packages or build them using pkgsrc. There exist some NetBSD pkg
tools like ```pkgin``` which help download e.g. dependencies as binary packages,
but overall NetBSD leaves a lot of detail exposed to the user.
The main package system used in NetBSD is pkgsrc and one will probably
want to read up more about it than is in the scope of this document.
See https://www.netbsd.org/docs/software/packages.html for more information.
This will not replace the compiler in the standard base distribution, and
cmake will probably find the base dist's compiler when it checks automatically.
Using the example of gcc12 from pkgsrc, one will need to set two
environment variables before starting:
```
export CC="/usr/pkg/gcc12/bin/cc"
export CXX="/usr/pkg/gcc12/bin/g++"
```
In FreeBSD similarly, you might want to install a different compiler.
If you want to use gcc, it is recommended to use gcc12.
You will also, as in NetBSD, need to install cmake, sqlite, boost and ragel packages.
Using the example of gcc12 from pkg:
installing the desired compiler:
```
pkg install gcc12
pkg install boost-all
pkg install ragel
pkg install cmake
pkg install sqlite
pkg install libpcap
pkg install ccache
```
and then before beginning the cmake and build process, set
the environment variables to point to this compiler:
```
export CC="/usr/local/bin/gcc"
export CXX="/usr/local/bin/g++"
```
A further note in FreeBSD, on the PowerPC and ARM platforms,
the gcc12 package installs to a slightly different name, on FreeBSD/ppc,
gcc12 will be found using:
```
export CC="/usr/local/bin/gcc12"
export CXX="/usr/local/bin/g++12"
```
Then continue with the build as below.
## Configure & build
In order to configure with `cmake` first create and cd into a build directory:
```
$ mkdir build
$ cd build
```
Then call `cmake` from inside the `build` directory:
```
$ cmake ../
```
Common options for Cmake are:
* `-DBUILD_STATIC_LIBS=[On|Off]` Build static libraries
* `-DBUILD_SHARED_LIBS=[On|Off]` Build shared libraries (if none are set static libraries are built by default)
* `-DCMAKE_BUILD_TYPE=[Release|Debug|RelWithDebInfo|MinSizeRel]` Configure build type and determine optimizations and certain features.
* `-DUSE_CPU_NATIVE=[On|Off]` Native CPU detection is off by default, however it is possible to build a performance-oriented non-fat library tuned to your CPU
* `-DFAT_RUNTIME=[On|Off]` Fat Runtime is only available for X86 32-bit/64-bit and AArch64 architectures and only on Linux. It is incompatible with `Debug` type and `USE_CPU_NATIVE`.
### Specific options for X86 32-bit/64-bit (Intel/AMD) CPUs
* `-DBUILD_AVX2=[On|Off]` Enable code for AVX2.
* `-DBUILD_AVX512=[On|Off]` Enable code for AVX512. Implies `BUILD_AVX2`.
* `-DBUILD_AVX512VBMI=[On|Off]` Enable code for AVX512 with VBMI extension. Implies `BUILD_AVX512`.
### Specific options for Arm 64-bit CPUs
* `-DBUILD_SVE=[On|Off]` Enable code for SVE, like on AWS Graviton3 CPUs. Not much code is ported just for SVE , but enabling SVE code production, does improve code generation, see [Benchmarks](https://github.com/VectorCamp/vectorscan/wiki/Benchmarks).
* `-DBUILD_SVE2=[On|Off]` Enable code for SVE2, implies `BUILD_SVE`. Most non-Neon code is written for SVE2
* `-DBUILD_SVE2_BITPERM=[On|Off]` Enable code for SVE2_BITPERM harwdare feature, implies `BUILD_SVE2`.
## Other options
* `SANITIZE=[address|memory|undefined]` (experimental) Use `libasan` sanitizer to detect possible bugs. For now only `address` is tested. This will eventually be integrated in the CI.
## SIMDe options
* `SIMDE_BACKEND=[On|Off]` Enable SIMDe backend. If this is chosen all native (SSE/AVX/AVX512/Neon/SVE/VSX) backends will be disabled and a SIMDe SSE4.2 emulation backend will be enabled. This will enable Vectorscan to build and run on architectures without SIMD.
* `SIMDE_NATIVE=[On|Off]` Enable SIMDe native emulation of x86 SSE4.2 intrinsics on the building platform. That is, SSE4.2 intrinsics will be emulated using Neon on an Arm platform, or VSX on a Power platform, etc.
## Build
If `cmake` has completed successfully you can run `make` in the same directory, if you have a multi-core system with `N` cores, running
```
$ make -j <N>
```
will speed up the process. If all goes well, you should have the vectorscan library compiled.
# Contributions
The official homepage for Vectorscan is at [www.github.com/VectorCamp/vectorscan](https://www.github.com/VectorCamp/vectorscan).
# Vectorscan Development
All development of Vectorscan is done in public.
# Original Hyperscan links
For reference, the official homepage for Hyperscan is at [www.hyperscan.io](https://www.hyperscan.io).
# Hyperscan Documentation
Information on building the Hyperscan library and using its API is available in
the [Developer Reference Guide](http://intel.github.io/hyperscan/dev-reference/).
And you can find the source code [on Github](https://github.com/intel/hyperscan).
For Intel Hyperscan related issues and questions, please follow the relevant links there.

View File

@ -0,0 +1,9 @@
include_directories(SYSTEM ${CMAKE_CURRENT_SOURCE_DIR})
include_directories(${PROJECT_SOURCE_DIR})
if (NOT FAT_RUNTIME AND (BUILD_SHARED_LIBS OR BUILD_STATIC_LIBS))
add_executable(benchmarks benchmarks.cpp)
set_source_files_properties(benchmarks.cpp PROPERTIES COMPILE_FLAGS
"-Wall -Wno-unused-variable")
target_link_libraries(benchmarks hs)
endif()

309
benchmarks/benchmarks.cpp Normal file
View File

@ -0,0 +1,309 @@
/*
* Copyright (c) 2020, 2021, VectorCamp PC
* Copyright (c) 2023, 2024, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
#include <chrono>
#include <cstdlib>
#include <cstring>
#include <ctime>
#include <functional>
#include <iostream>
#include <memory>
#include "util/arch.h"
#include "benchmarks.hpp"
#define MAX_LOOPS 1000000000
#define MAX_MATCHES 5
#define N 8
struct hlmMatchEntry {
size_t to;
u32 id;
hlmMatchEntry(size_t end, u32 identifier) : to(end), id(identifier) {}
};
std::vector<hlmMatchEntry> ctxt;
static hwlmcb_rv_t hlmSimpleCallback(size_t to, u32 id,
UNUSED struct hs_scratch *scratch) { // cppcheck-suppress constParameterCallback
DEBUG_PRINTF("match @%zu = %u\n", to, id);
ctxt.push_back(hlmMatchEntry(to, id));
return HWLM_CONTINUE_MATCHING;
}
template <typename InitFunc, typename BenchFunc>
static void run_benchmarks(int size, int loops, int max_matches,
bool is_reverse, MicroBenchmark &bench,
InitFunc &&init, BenchFunc &&func) {
init(bench);
double total_sec = 0.0;
double max_bw = 0.0;
double avg_time = 0.0;
if (max_matches) {
double avg_bw = 0.0;
int pos = 0;
for (int j = 0; j < max_matches - 1; j++) {
bench.buf[pos] = 'b';
pos = (j + 1) * size / max_matches;
bench.buf[pos] = 'a';
u64a actual_size = 0;
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < loops; i++) {
const u8 *res = func(bench);
if (is_reverse)
actual_size += bench.buf.data() + size - res;
else
actual_size += res - bench.buf.data();
}
auto end = std::chrono::steady_clock::now();
double dt = std::chrono::duration_cast<std::chrono::microseconds>(
end - start)
.count();
total_sec += dt;
/*convert microseconds to seconds*/
/*calculate bandwidth*/
double bw = (actual_size / dt) * 1000000.0 / 1048576.0;
/*std::cout << "act_size = " << act_size << std::endl;
std::cout << "dt = " << dt << std::endl;
std::cout << "bw = " << bw << std::endl;*/
avg_bw += bw;
/*convert to MB/s*/
max_bw = std::max(bw, max_bw);
/*calculate average time*/
avg_time += total_sec / loops;
}
avg_time /= max_matches;
avg_bw /= max_matches;
total_sec /= 1000000.0;
/*convert average time to us*/
printf("%-18s, %-12d, %-10d, %-6d, %-10.3f, %-9.3f, %-8.3f, %-7.3f\n",
bench.label, max_matches, size ,loops, total_sec, avg_time, max_bw, avg_bw);
} else {
u64a total_size = 0;
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < loops; i++) {
func(bench);
}
auto end = std::chrono::steady_clock::now();
total_sec +=
std::chrono::duration_cast<std::chrono::microseconds>(end - start)
.count();
/*calculate transferred size*/
total_size = (u64a)size * (u64a)loops;
/*calculate average time*/
avg_time = total_sec / loops;
/*convert microseconds to seconds*/
total_sec /= 1000000.0;
/*calculate maximum bandwidth*/
max_bw = total_size / total_sec;
/*convert to MB/s*/
max_bw /= 1048576.0;
printf("%-18s, %-12s, %-10d, %-6d, %-10.3f, %-9.3f, %-8.3f, %-7s\n",
bench.label, "0", size, loops, total_sec, avg_time, max_bw, "0");
}
}
int main(){
const int matches[] = {0, MAX_MATCHES};
std::vector<size_t> sizes;
for (size_t i = 0; i < N; i++)
sizes.push_back(16000 << i * 2);
const char charset[] = "aAaAaAaAAAaaaaAAAAaaaaAAAAAAaaaAAaaa";
printf("%-18s, %-12s, %-10s, %-6s, %-10s, %-9s, %-8s, %-7s\n", "Matcher",
"max_matches", "size", "loops", "total_sec", "avg_time", "max_bw",
"avg_bw");
for (int m = 0; m < 2; m++) {
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Shufti", sizes[i]);
run_benchmarks(
sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::shuftiBuildMasks(b.chars,
reinterpret_cast<u8 *>(&b.truffle_mask_lo),
reinterpret_cast<u8 *>(&b.truffle_mask_hi));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return shuftiExec(b.truffle_mask_lo, b.truffle_mask_hi, b.buf.data(),
b.buf.data() + b.size);
});
}
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Reverse Shufti", sizes[i]);
run_benchmarks(
sizes[i], MAX_LOOPS / sizes[i], matches[m], true, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::shuftiBuildMasks(b.chars,
reinterpret_cast<u8 *>(&b.truffle_mask_lo),
reinterpret_cast<u8 *>(&b.truffle_mask_hi));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return rshuftiExec(b.truffle_mask_lo, b.truffle_mask_hi, b.buf.data(),
b.buf.data() + b.size);
});
}
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Truffle", sizes[i]);
run_benchmarks(
sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::truffleBuildMasks(b.chars,
reinterpret_cast<u8 *>(&b.truffle_mask_lo),
reinterpret_cast<u8 *>(&b.truffle_mask_hi));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return truffleExec(b.truffle_mask_lo, b.truffle_mask_hi, b.buf.data(),
b.buf.data() + b.size);
});
}
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Reverse Truffle", sizes[i]);
run_benchmarks(
sizes[i], MAX_LOOPS / sizes[i], matches[m], true, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::truffleBuildMasks(b.chars,
reinterpret_cast<u8 *>(&b.truffle_mask_lo),
reinterpret_cast<u8 *>(&b.truffle_mask_hi));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return rtruffleExec(b.truffle_mask_lo, b.truffle_mask_hi, b.buf.data(),
b.buf.data() + b.size);
});
}
#ifdef CAN_USE_WIDE_TRUFFLE
if(CAN_USE_WIDE_TRUFFLE) {
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Truffle Wide", sizes[i]);
run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::truffleBuildMasksWide(b.chars, reinterpret_cast<u8 *>(&b.truffle_mask));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return truffleExecWide(b.truffle_mask, b.buf.data(), b.buf.data() + b.size);
}
);
}
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Reverse Truffle Wide", sizes[i]);
run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], true, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::truffleBuildMasksWide(b.chars, reinterpret_cast<u8 *>(&b.truffle_mask));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return rtruffleExecWide(b.truffle_mask, b.buf.data(), b.buf.data() + b.size);
}
);
}
}
#endif
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Vermicelli", sizes[i]);
run_benchmarks(
sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::truffleBuildMasks(b.chars,
reinterpret_cast<u8 *>(&b.truffle_mask_lo),
reinterpret_cast<u8 *>(&b.truffle_mask_hi));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return vermicelliExec('a', 'b', b.buf.data(),
b.buf.data() + b.size);
});
}
for (size_t i = 0; i < std::size(sizes); i++) {
MicroBenchmark bench("Reverse Vermicelli", sizes[i]);
run_benchmarks(
sizes[i], MAX_LOOPS / sizes[i], matches[m], true, bench,
[&](MicroBenchmark &b) {
b.chars.set('a');
ue2::truffleBuildMasks(b.chars,
reinterpret_cast<u8 *>(&b.truffle_mask_lo),
reinterpret_cast<u8 *>(&b.truffle_mask_hi));
memset(b.buf.data(), 'b', b.size);
},
[&](MicroBenchmark const &b) {
return rvermicelliExec('a', 'b', b.buf.data(),
b.buf.data() + b.size);
});
}
for (size_t i = 0; i < std::size(sizes); i++) {
// we imitate the noodle unit tests
std::string str;
const size_t char_len = 5;
str.resize(char_len + 2);
for (size_t j = 0; j < char_len; j++) {
srand(time(NULL));
int key = rand() % +36;
str[char_len] = charset[key];
str[char_len + 1] = '\0';
}
MicroBenchmark bench("Noodle", sizes[i]);
run_benchmarks(
sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench,
[&](MicroBenchmark &b) {
ctxt.clear();
memset(b.buf.data(), 'a', b.size);
u32 id = 1000;
ue2::hwlmLiteral lit(str, true, id);
b.nt = ue2::noodBuildTable(lit);
assert(b.nt.get() != nullptr);
},
[&](MicroBenchmark &b) { // cppcheck-suppress constParameterReference
noodExec(b.nt.get(), b.buf.data(), b.size, 0,
hlmSimpleCallback, &b.scratch);
return b.buf.data() + b.size;
});
}
}
return 0;
}

67
benchmarks/benchmarks.hpp Normal file
View File

@ -0,0 +1,67 @@
/*
* Copyright (c) 2020, 2021, VectorCamp PC
* Copyright (c) 2024, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
#include "hwlm/hwlm_literal.h"
#include "hwlm/noodle_build.h"
#include "hwlm/noodle_engine.h"
#include "hwlm/noodle_internal.h"
#include "nfa/shufti.h"
#include "nfa/shufticompile.h"
#include "nfa/truffle.h"
#include "nfa/trufflecompile.h"
#include "nfa/vermicelli.hpp"
#include "scratch.h"
#include "util/bytecode_ptr.h"
class MicroBenchmark {
public:
struct hs_scratch scratch{};
char const *label;
size_t size;
std::vector<u8> buf;
ue2::bytecode_ptr<noodTable> nt;
ue2::CharReach chars;
// Shufti/Truffle
union {
m256 truffle_mask;
struct {
#if (SIMDE_ENDIAN_ORDER == SIMDE_ENDIAN_LITTLE)
m128 truffle_mask_lo;
m128 truffle_mask_hi;
#else
m128 truffle_mask_hi;
m128 truffle_mask_lo;
#endif
};
};
MicroBenchmark(char const *label_, size_t size_)
: label(label_), size(size_), buf(size_){};
};

View File

@ -33,17 +33,15 @@ target_link_libraries(chimera hs pcre)
install(TARGETS chimera DESTINATION ${CMAKE_INSTALL_LIBDIR})
if (NOT WIN32)
# expand out library names for pkgconfig static link info
foreach (LIB ${CMAKE_CXX_IMPLICIT_LINK_LIBRARIES})
# this is fragile, but protects us from toolchain specific files
if (NOT EXISTS ${LIB})
set(PRIVATE_LIBS "${PRIVATE_LIBS} -l${LIB}")
endif()
endforeach()
set(PRIVATE_LIBS "${PRIVATE_LIBS} -L${LIBDIR} -lpcre")
# expand out library names for pkgconfig static link info
foreach (LIB ${CMAKE_CXX_IMPLICIT_LINK_LIBRARIES})
# this is fragile, but protects us from toolchain specific files
if (NOT EXISTS ${LIB})
set(PRIVATE_LIBS "${PRIVATE_LIBS} -l${LIB}")
endif()
endforeach()
set(PRIVATE_LIBS "${PRIVATE_LIBS} -L${LIBDIR} -lpcre")
configure_file(libch.pc.in libch.pc @ONLY) # only replace @ quoted vars
install(FILES ${CMAKE_BINARY_DIR}/chimera/libch.pc
DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
endif()
configure_file(libch.pc.in libch.pc @ONLY) # only replace @ quoted vars
install(FILES ${CMAKE_BINARY_DIR}/chimera/libch.pc
DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2018, Intel Corporation
* Copyright (c) 2018-2020, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -345,6 +345,16 @@ ch_error_t HS_CDECL ch_set_scratch_allocator(ch_alloc_t alloc_func,
*/
#define CH_SCRATCH_IN_USE (-10)
/**
* Unexpected internal error from Hyperscan.
*
* This error indicates that there was unexpected matching behaviors from
* Hyperscan. This could be related to invalid usage of scratch space or
* invalid memory operations by users.
*
*/
#define CH_UNKNOWN_HS_ERROR (-13)
/**
* Returned when pcre_exec (called for some expressions internally from @ref
* ch_scan) failed due to a fatal error.

View File

@ -39,7 +39,6 @@
#include "hs_internal.h"
#include "ue2common.h"
#include "util/compile_error.h"
#include "util/make_unique.h"
#include "util/multibit_build.h"
#include "util/target_info.h"
@ -495,7 +494,7 @@ void ch_compile_multi_int(const char *const *expressions, const unsigned *flags,
// First, build with libpcre. A build failure from libpcre will throw
// an exception up to the caller.
auto patternData =
ue2::make_unique<PatternData>(myExpr, myFlags, i, myId, mode, match_limit,
std::make_unique<PatternData>(myExpr, myFlags, i, myId, mode, match_limit,
match_limit_recursion, platform);
pcres.push_back(move(patternData));
PatternData &curr = *pcres.back();

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2018, Intel Corporation
* Copyright (c) 2018-2022, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -326,6 +326,10 @@ ch_error_t catchupPcre(struct HybridContext *hyctx, unsigned int id,
} else if (cbrv == CH_CALLBACK_SKIP_PATTERN) {
DEBUG_PRINTF("user callback told us to skip this pattern\n");
pd->scanStart = hyctx->length;
if (top_id == id) {
break;
}
continue;
}
if (top_id == id) {
@ -419,6 +423,7 @@ int HS_CDECL multiCallback(unsigned int id, unsigned long long from,
DEBUG_PRINTF("user callback told us to skip this pattern\n");
pd->scanStart = hyctx->length;
ret = HS_SUCCESS;
hyctx->scratch->ret = ret;
} else if (ret == CH_FAIL_INTERNAL) {
return ret;
}
@ -590,11 +595,24 @@ ch_error_t ch_scan_i(const ch_database_t *hydb,
if (!(db->flags & CHIMERA_FLAG_NO_MULTIMATCH)) {
ret = scanHyperscan(&hyctx, data, length);
if (ret != HS_SUCCESS && scratch->ret != CH_SUCCESS) {
DEBUG_PRINTF("Hyperscan returned error %d\n", scratch->ret);
// Errors from pcre scan.
if (scratch->ret == CH_CALLBACK_TERMINATE) {
DEBUG_PRINTF("Pcre terminates scan\n");
unmarkScratchInUse(scratch);
return CH_SCAN_TERMINATED;
} else if (scratch->ret != CH_SUCCESS) {
DEBUG_PRINTF("Pcre internal error\n");
unmarkScratchInUse(scratch);
return scratch->ret;
}
// Errors from Hyperscan scan. Note Chimera could terminate
// Hyperscan callback on purpose so this is not counted as an error.
if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
assert(scratch->ret == CH_SUCCESS);
DEBUG_PRINTF("Hyperscan returned error %d\n", ret);
unmarkScratchInUse(scratch);
return ret;
}
}
DEBUG_PRINTF("Flush priority queue\n");

View File

@ -1,96 +0,0 @@
# detect architecture features
#
# must be called after determining where compiler intrinsics are defined
if (HAVE_C_X86INTRIN_H)
set (INTRIN_INC_H "x86intrin.h")
elseif (HAVE_C_INTRIN_H)
set (INTRIN_INC_H "intrin.h")
else ()
message (FATAL_ERROR "No intrinsics header found")
endif ()
if (BUILD_AVX512)
CHECK_C_COMPILER_FLAG(${SKYLAKE_FLAG} HAS_ARCH_SKYLAKE)
if (NOT HAS_ARCH_SKYLAKE)
message (FATAL_ERROR "AVX512 not supported by compiler")
endif ()
endif ()
if (FAT_RUNTIME)
# test the highest level microarch to make sure everything works
if (BUILD_AVX512)
set (CMAKE_REQUIRED_FLAGS "${CMAKE_C_FLAGS} ${EXTRA_C_FLAGS} ${SKYLAKE_FLAG}")
else ()
set (CMAKE_REQUIRED_FLAGS "${CMAKE_C_FLAGS} ${EXTRA_C_FLAGS} -march=core-avx2")
endif ()
else (NOT FAT_RUNTIME)
# if not fat runtime, then test given cflags
set (CMAKE_REQUIRED_FLAGS "${CMAKE_C_FLAGS} ${EXTRA_C_FLAGS} ${ARCH_C_FLAGS}")
endif ()
# ensure we have the minimum of SSSE3 - call a SSSE3 intrinsic
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
int main() {
__m128i a = _mm_set1_epi8(1);
(void)_mm_shuffle_epi8(a, a);
}" HAVE_SSSE3)
# now look for AVX2
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
#if !defined(__AVX2__)
#error no avx2
#endif
int main(){
__m256i z = _mm256_setzero_si256();
(void)_mm256_xor_si256(z, z);
}" HAVE_AVX2)
# and now for AVX512
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
#if !defined(__AVX512BW__)
#error no avx512bw
#endif
int main(){
__m512i z = _mm512_setzero_si512();
(void)_mm512_abs_epi8(z);
}" HAVE_AVX512)
# and now for AVX512VBMI
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
#if !defined(__AVX512VBMI__)
#error no avx512vbmi
#endif
int main(){
__m512i a = _mm512_set1_epi8(0xFF);
__m512i idx = _mm512_set_epi64(3ULL, 2ULL, 1ULL, 0ULL, 7ULL, 6ULL, 5ULL, 4ULL);
(void)_mm512_permutexvar_epi8(idx, a);
}" HAVE_AVX512VBMI)
if (FAT_RUNTIME)
if (NOT HAVE_SSSE3)
message(FATAL_ERROR "SSSE3 support required to build fat runtime")
endif ()
if (NOT HAVE_AVX2)
message(FATAL_ERROR "AVX2 support required to build fat runtime")
endif ()
if (BUILD_AVX512 AND NOT HAVE_AVX512)
message(FATAL_ERROR "AVX512 support requested but not supported")
endif ()
else (NOT FAT_RUNTIME)
if (NOT HAVE_AVX2)
message(STATUS "Building without AVX2 support")
endif ()
if (NOT HAVE_AVX512)
message(STATUS "Building without AVX512 support")
endif ()
if (NOT HAVE_SSSE3)
message(FATAL_ERROR "A minimum of SSSE3 compiler support is required")
endif ()
endif ()
unset (CMAKE_REQUIRED_FLAGS)
unset (INTRIN_INC_H)

111
cmake/archdetect.cmake Normal file
View File

@ -0,0 +1,111 @@
if (USE_CPU_NATIVE)
# Detect best GNUCC_ARCH to tune for
if (CMAKE_COMPILER_IS_GNUCC)
message(STATUS "gcc version ${CMAKE_C_COMPILER_VERSION}")
# If gcc doesn't recognise the host cpu, then mtune=native becomes
# generic, which isn't very good in some cases. march=native looks at
# cpuid info and then chooses the best microarch it can (and replaces
# the flag), so use that for tune.
set(TUNE_FLAG "mtune")
set(GNUCC_TUNE "")
message(STATUS "ARCH_FLAG '${ARCH_FLAG}' '${GNUCC_ARCH}', TUNE_FLAG '${TUNE_FLAG}' '${GNUCC_TUNE}' ")
# arg1 might exist if using ccache
string (STRIP "${CMAKE_C_COMPILER_ARG1}" CC_ARG1)
set (EXEC_ARGS ${CC_ARG1} -c -Q --help=target -${ARCH_FLAG}=native -${TUNE_FLAG}=native)
execute_process(COMMAND ${CMAKE_C_COMPILER} ${EXEC_ARGS}
OUTPUT_VARIABLE _GCC_OUTPUT)
set(_GCC_OUTPUT_TUNE ${_GCC_OUTPUT})
string(FIND "${_GCC_OUTPUT}" "${ARCH_FLAG}=" POS)
string(SUBSTRING "${_GCC_OUTPUT}" ${POS} -1 _GCC_OUTPUT)
string(REGEX REPLACE "${ARCH_FLAG}=[ \t]*([^ \n]*)[ \n].*" "\\1" GNUCC_ARCH "${_GCC_OUTPUT}")
string(FIND "${_GCC_OUTPUT_TUNE}" "${TUNE_FLAG}=" POS_TUNE)
string(SUBSTRING "${_GCC_OUTPUT_TUNE}" ${POS_TUNE} -1 _GCC_OUTPUT_TUNE)
string(REGEX REPLACE "${TUNE_FLAG}=[ \t]*([^ \n]*)[ \n].*" "\\1" GNUCC_TUNE "${_GCC_OUTPUT_TUNE}")
message(STATUS "ARCH_FLAG '${ARCH_FLAG}' '${GNUCC_ARCH}', TUNE_FLAG '${TUNE_FLAG}' '${GNUCC_TUNE}' ")
# test the parsed flag
set (EXEC_ARGS ${CC_ARG1} -E - -${ARCH_FLAG}=${GNUCC_ARCH} -${TUNE_FLAG}=${GNUCC_TUNE})
execute_process(COMMAND ${CMAKE_C_COMPILER} ${EXEC_ARGS}
OUTPUT_QUIET ERROR_QUIET
INPUT_FILE /dev/null
RESULT_VARIABLE GNUCC_TUNE_TEST)
if (NOT GNUCC_TUNE_TEST EQUAL 0)
message(WARNING "Something went wrong determining gcc tune: -mtune=${GNUCC_TUNE} not valid, falling back to -mtune=native")
set(GNUCC_TUNE native)
else()
set(GNUCC_TUNE ${GNUCC_TUNE})
message(STATUS "gcc will tune for ${GNUCC_ARCH}, ${GNUCC_TUNE}")
endif()
elseif (CMAKE_COMPILER_IS_CLANG)
if (ARCH_IA32 OR ARCH_X86_64)
set(GNUCC_ARCH x86-64-v2)
set(TUNE_FLAG generic)
elseif(ARCH_AARCH64)
if (BUILD_SVE2_BITPERM)
set(GNUCC_ARCH ${SVE2_BITPERM_ARCH})
elseif (BUILD_SVE2)
set(GNUCC_ARCH ${SVE2_ARCH})
elseif (BUILD_SVE)
set(GNUCC_ARCH ${SVE_ARCH})
else ()
set(GNUCC_ARCH ${ARMV8_ARCH})
endif()
set(TUNE_FLAG generic)
elseif(ARCH_ARM32)
set(GNUCC_ARCH armv7a)
set(TUNE_FLAG generic)
else()
set(GNUCC_ARCH native)
set(TUNE_FLAG generic)
endif()
message(STATUS "clang will tune for ${GNUCC_ARCH}, ${TUNE_FLAG}")
endif()
else()
if (SIMDE_BACKEND)
if (ARCH_IA32 OR ARCH_X86_64)
set(GNUCC_ARCH x86-64-v2)
set(TUNE_FLAG generic)
elseif(ARCH_AARCH64)
set(GNUCC_ARCH armv8-a)
set(TUNE_FLAG generic)
elseif(ARCH_ARM32)
set(GNUCC_ARCH armv7a)
set(TUNE_FLAG generic)
elseif(ARCH_PPC64EL)
set(GNUCC_ARCH power8)
set(TUNE_FLAG power8)
else()
set(GNUCC_ARCH x86-64-v2)
set(TUNE_FLAG generic)
endif()
elseif (ARCH_IA32 OR ARCH_X86_64)
set(GNUCC_ARCH ${X86_ARCH})
set(TUNE_FLAG generic)
elseif(ARCH_AARCH64)
if (BUILD_SVE2_BITPERM)
set(GNUCC_ARCH ${SVE2_BITPERM_ARCH})
elseif (BUILD_SVE2)
set(GNUCC_ARCH ${SVE2_ARCH})
elseif (BUILD_SVE)
set(GNUCC_ARCH ${SVE_ARCH})
else ()
set(GNUCC_ARCH ${ARMV8_ARCH})
endif()
set(TUNE_FLAG generic)
elseif(ARCH_ARM32)
set(GNUCC_ARCH armv7a)
set(TUNE_FLAG generic)
elseif(ARCH_PPC64EL)
set(GNUCC_ARCH power8)
set(TUNE_FLAG power8)
else()
set(GNUCC_ARCH native)
set(TUNE_FLAG native)
endif()
endif()

View File

@ -15,13 +15,21 @@ SYMSFILE=$(mktemp -p /tmp ${PREFIX}_rename.syms.XXXXX)
KEEPSYMS=$(mktemp -p /tmp keep.syms.XXXXX)
# find the libc used by gcc
LIBC_SO=$("$@" --print-file-name=libc.so.6)
NM_FLAG="-f"
if [ `uname` = "FreeBSD" ]; then
# for freebsd, we will specify the name,
# we will leave it work as is in linux
LIBC_SO=/lib/libc.so.7
# also, in BSD, the nm flag -F corresponds to the -f flag in linux.
NM_FLAG="-F"
fi
cp ${KEEPSYMS_IN} ${KEEPSYMS}
# get all symbols from libc and turn them into patterns
nm -f p -g -D ${LIBC_SO} | sed -s 's/\([^ ]*\).*/^\1$/' >> ${KEEPSYMS}
nm ${NM_FLAG} p -g -D ${LIBC_SO} | sed 's/\([^ @]*\).*/^\1$/' >> ${KEEPSYMS}
# build the object
"$@"
# rename the symbols in the object
nm -f p -g ${OUT} | cut -f1 -d' ' | grep -v -f ${KEEPSYMS} | sed -e "s/\(.*\)/\1\ ${PREFIX}_\1/" >> ${SYMSFILE}
nm ${NM_FLAG} p -g ${OUT} | cut -f1 -d' ' | grep -v -f ${KEEPSYMS} | sed -e "s/\(.*\)/\1\ ${PREFIX}_\1/" >> ${SYMSFILE}
if test -s ${SYMSFILE}
then
objcopy --redefine-syms=${SYMSFILE} ${OUT}

93
cmake/cflags-arm.cmake Normal file
View File

@ -0,0 +1,93 @@
if (NOT FAT_RUNTIME)
if (BUILD_SVE2_BITPERM)
message (STATUS "SVE2_BITPERM implies SVE2, enabling BUILD_SVE2")
set(BUILD_SVE2 ON)
endif ()
if (BUILD_SVE2)
message (STATUS "SVE2 implies SVE, enabling BUILD_SVE")
set(BUILD_SVE ON)
endif ()
endif ()
if (CMAKE_COMPILER_IS_GNUCXX)
set(ARMV9BASE_MINVER "12")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS ARMV9BASE_MINVER)
set(SVE2_ARCH "armv8-a+sve2")
else()
set(SVE2_ARCH "armv9-a")
endif()
else()
set(SVE2_ARCH "armv9-a")
endif()
set(ARMV8_ARCH "armv8-a")
set(SVE_ARCH "${ARMV8_ARCH}+sve")
set(SVE2_BITPERM_ARCH "${SVE2_ARCH}+sve2-bitperm")
CHECK_INCLUDE_FILE_CXX(arm_neon.h HAVE_C_ARM_NEON_H)
if (BUILD_SVE OR BUILD_SVE2 OR BUILD_SVE2_BITPERM OR FAT_RUNTIME)
set(CMAKE_REQUIRED_FLAGS "-march=${SVE_ARCH}")
CHECK_INCLUDE_FILE_CXX(arm_sve.h HAVE_C_ARM_SVE_H)
if (NOT HAVE_C_ARM_SVE_H)
message(FATAL_ERROR "arm_sve.h is required to build for SVE.")
endif()
endif()
CHECK_C_SOURCE_COMPILES("#include <arm_neon.h>
int main() {
int32x4_t a = vdupq_n_s32(1);
(void)a;
}" HAVE_NEON)
if (BUILD_SVE2_BITPERM)
set(CMAKE_REQUIRED_FLAGS "-march=${SVE2_BITPERM_ARCH}")
CHECK_C_SOURCE_COMPILES("#include <arm_sve.h>
int main() {
svuint8_t a = svbext(svdup_u8(1), svdup_u8(2));
(void)a;
}" HAVE_SVE2_BITPERM)
endif()
if (BUILD_SVE2)
set(CMAKE_REQUIRED_FLAGS "-march=${SVE2_ARCH}")
CHECK_C_SOURCE_COMPILES("#include <arm_sve.h>
int main() {
svuint8_t a = svbsl(svdup_u8(1), svdup_u8(2), svdup_u8(3));
(void)a;
}" HAVE_SVE2)
endif()
if (BUILD_SVE)
set(CMAKE_REQUIRED_FLAGS "-march=${SVE_ARCH}")
CHECK_C_SOURCE_COMPILES("#include <arm_sve.h>
int main() {
svuint8_t a = svdup_u8(1);
(void)a;
}" HAVE_SVE)
endif ()
if (FAT_RUNTIME)
if (NOT HAVE_NEON)
message(FATAL_ERROR "NEON support required to build fat runtime")
endif ()
if (BUILD_SVE AND NOT HAVE_SVE)
message(FATAL_ERROR "SVE support required to build fat runtime")
endif ()
if (BUILD_SVE2 AND NOT HAVE_SVE2)
message(FATAL_ERROR "SVE2 support required to build fat runtime")
endif ()
if (BUILD_SVE2_BITPERM AND NOT HAVE_SVE2_BITPERM)
message(FATAL_ERROR "SVE2 support required to build fat runtime")
endif ()
else (NOT FAT_RUNTIME)
if (NOT BUILD_SVE)
message(STATUS "Building without SVE support")
endif ()
if (NOT BUILD_SVE2)
message(STATUS "Building without SVE2 support")
endif ()
if (NOT HAVE_NEON)
message(FATAL_ERROR "Neon/ASIMD support required for Arm support")
endif ()
endif ()

106
cmake/cflags-generic.cmake Normal file
View File

@ -0,0 +1,106 @@
# set compiler flags - more are tested and added later
set(EXTRA_C_FLAGS "${OPT_C_FLAG} -std=c17 -Wall -Wextra ")
set(EXTRA_CXX_FLAGS "${OPT_CXX_FLAG} -std=c++17 -Wall -Wextra ")
if (NOT CMAKE_COMPILER_IS_CLANG)
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -fno-new-ttp-matching")
endif()
# Always use -Werror *also during release builds
set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wall -Werror")
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wall -Werror")
if (DISABLE_ASSERTS)
set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -DNDEBUG")
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -DNDEBUG")
endif()
if(CMAKE_COMPILER_IS_GNUCC)
# spurious warnings?
set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-array-bounds ") #-Wno-maybe-uninitialized")
endif()
if(CMAKE_COMPILER_IS_GNUCXX)
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-maybe-uninitialized -Wno-uninitialized")
endif()
CHECK_INCLUDE_FILES(unistd.h HAVE_UNISTD_H)
CHECK_FUNCTION_EXISTS(posix_memalign HAVE_POSIX_MEMALIGN)
CHECK_FUNCTION_EXISTS(_aligned_malloc HAVE__ALIGNED_MALLOC)
if(FREEBSD OR NETBSD)
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -gdwarf-4")
endif()
if(NETBSD)
set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -DHAVE_BUILTIN_POPCOUNT")
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -DHAVE_BUILTIN_POPCOUNT")
endif()
if(MACOSX)
# Boost headers cause such complains on MacOS
set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-deprecated-declarations -Wno-unused-parameter")
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-deprecated-declarations -Wno-unused-parameter")
endif()
# these end up in the config file
CHECK_C_COMPILER_FLAG(-fvisibility=hidden HAS_C_HIDDEN)
CHECK_CXX_COMPILER_FLAG(-fvisibility=hidden HAS_CXX_HIDDEN)
# are we using libc++
CHECK_CXX_SYMBOL_EXISTS(_LIBCPP_VERSION ciso646 HAVE_LIBCPP)
if (RELEASE_BUILD)
if (HAS_C_HIDDEN)
set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -fvisibility=hidden")
endif()
if (HAS_CXX_HIDDEN)
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -fvisibility=hidden")
endif()
endif()
# testing a builtin takes a little more work
CHECK_C_SOURCE_COMPILES("void *aa_test(void *x) { return __builtin_assume_aligned(x, 16);}\nint main(void) { return 0; }" HAVE_CC_BUILTIN_ASSUME_ALIGNED)
CHECK_CXX_SOURCE_COMPILES("void *aa_test(void *x) { return __builtin_assume_aligned(x, 16);}\nint main(void) { return 0; }" HAVE_CXX_BUILTIN_ASSUME_ALIGNED)
# Clang does not use __builtin_constant_p() the same way as gcc
if (NOT CMAKE_COMPILER_IS_CLANG)
CHECK_C_SOURCE_COMPILES("int main(void) { __builtin_constant_p(0); }" HAVE__BUILTIN_CONSTANT_P)
endif()
# clang-14 complains about unused-but-set variable.
CHECK_CXX_COMPILER_FLAG("-Wunused-but-set-variable" CXX_UNUSED_BUT_SET_VAR)
if (CXX_UNUSED_BUT_SET_VAR)
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-unused-but-set-variable")
endif()
CHECK_CXX_COMPILER_FLAG("-Wignored-attributes" CXX_IGNORED_ATTR)
if(CMAKE_COMPILER_IS_GNUCC)
if (CXX_IGNORED_ATTR)
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-ignored-attributes")
endif()
endif()
CHECK_CXX_COMPILER_FLAG("-Wignored-attributes" CXX_NON_NULL)
if(CMAKE_COMPILER_IS_GNUCC)
if (CXX_NON_NULL)
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-nonnull")
endif()
endif()
# note this for later, g++ doesn't have this flag but clang does
CHECK_CXX_COMPILER_FLAG("-Wweak-vtables" CXX_WEAK_VTABLES)
CHECK_CXX_COMPILER_FLAG("-Wmissing-declarations" CXX_MISSING_DECLARATIONS)
CHECK_CXX_COMPILER_FLAG("-Wunused-local-typedefs" CXX_UNUSED_LOCAL_TYPEDEFS)
CHECK_CXX_COMPILER_FLAG("-Wunused-variable" CXX_WUNUSED_VARIABLE)
# gcc complains about this
if(CMAKE_COMPILER_IS_GNUCC)
CHECK_C_COMPILER_FLAG("-Wstringop-overflow" CC_STRINGOP_OVERFLOW)
CHECK_CXX_COMPILER_FLAG("-Wstringop-overflow" CXX_STRINGOP_OVERFLOW)
if(CC_STRINGOP_OVERFLOW OR CXX_STRINGOP_OVERFLOW)
set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-stringop-overflow -Wno-stringop-overread")
set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-stringop-overflow -Wno-stringop-overread")
endif()
endif()

View File

@ -0,0 +1,27 @@
CHECK_INCLUDE_FILE_CXX(altivec.h HAVE_C_PPC64EL_ALTIVEC_H)
if (HAVE_C_PPC64EL_ALTIVEC_H)
set (INTRIN_INC_H "altivec.h")
else()
message (FATAL_ERROR "No intrinsics header found for VSX")
endif ()
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
int main() {
vector int a = vec_splat_s32(1);
(void)a;
}" HAVE_VSX)
if (NOT HAVE_VSX)
message(FATAL_ERROR "VSX support required for Power support")
endif ()
# fix unit-internal seg fault for freebsd and gcc13
if (FREEBSD AND CMAKE_COMPILER_IS_GNUCXX)
if(CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "13")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -static-libgcc -static-libstdc++")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -static-libgcc -static-libstdc++")
set(CMAKE_MODULE_LINKER_FLAGS "${CMAKE_MODULE_LINKER_FLAGS} -static-libgcc -static-libstdc++")
endif ()
endif ()

140
cmake/cflags-x86.cmake Normal file
View File

@ -0,0 +1,140 @@
option(BUILD_AVX512 "Enabling support for AVX512" OFF)
option(BUILD_AVX512VBMI "Enabling support for AVX512VBMI" OFF)
set(SKYLAKE_ARCH "skylake-avx512")
set(ICELAKE_ARCH "icelake-server")
set(SKYLAKE_FLAG "-march=${SKYLAKE_ARCH}")
set(ICELAKE_FLAG "-march=${ICELAKE_ARCH}")
if (NOT FAT_RUNTIME)
if (BUILD_AVX512VBMI)
message (STATUS "AVX512VBMI implies AVX512, enabling BUILD_AVX512")
set(BUILD_AVX512 ON)
set(BUILD_AVX2 ON)
set(ARCH_C_FLAGS "${ICELAKE_FLAG}")
set(ARCH_CXX_FLAGS "${ICELAKE_FLAG}")
set(X86_ARCH "${ICELAKE_ARCH}")
elseif (BUILD_AVX512)
message (STATUS "AVX512 implies AVX2, enabling BUILD_AVX2")
set(BUILD_AVX2 ON)
set(ARCH_C_FLAGS "${SKYLAKE_FLAG}")
set(ARCH_CXX_FLAGS "${SKYLAKE_FLAG}")
set(X86_ARCH "${SKYLAKE_ARCH}")
elseif (BUILD_AVX2)
message (STATUS "Enabling BUILD_AVX2")
set(ARCH_C_FLAGS "-mavx2")
set(ARCH_CXX_FLAGS "-mavx2")
set(X86_ARCH "core-avx2")
else()
set(ARCH_C_FLAGS "-msse4.2")
set(ARCH_CXX_FLAGS "-msse4.2")
set(X86_ARCH "x86-64-v2")
endif()
else()
set(BUILD_AVX512VBMI ON)
set(BUILD_AVX512 ON)
set(BUILD_AVX2 ON)
set(ARCH_C_FLAGS "-msse4.2")
set(ARCH_CXX_FLAGS "-msse4.2")
set(X86_ARCH "x86-64-v2")
endif()
set(CMAKE_REQUIRED_FLAGS "${ARCH_C_FLAGS}")
CHECK_INCLUDE_FILES(intrin.h HAVE_C_INTRIN_H)
CHECK_INCLUDE_FILE_CXX(intrin.h HAVE_CXX_INTRIN_H)
CHECK_INCLUDE_FILES(x86intrin.h HAVE_C_X86INTRIN_H)
CHECK_INCLUDE_FILE_CXX(x86intrin.h HAVE_CXX_X86INTRIN_H)
if (HAVE_C_X86INTRIN_H)
set (INTRIN_INC_H "x86intrin.h")
elseif (HAVE_C_INTRIN_H)
set (INTRIN_INC_H "intrin.h")
else()
message (FATAL_ERROR "No intrinsics header found for SSE/AVX2/AVX512")
endif ()
if (BUILD_AVX512)
CHECK_C_COMPILER_FLAG(${SKYLAKE_FLAG} HAS_ARCH_SKYLAKE)
if (NOT HAS_ARCH_SKYLAKE)
message (FATAL_ERROR "AVX512 not supported by compiler")
endif ()
endif ()
if (BUILD_AVX512VBMI)
CHECK_C_COMPILER_FLAG(${ICELAKE_FLAG} HAS_ARCH_ICELAKE)
if (NOT HAS_ARCH_ICELAKE)
message (FATAL_ERROR "AVX512VBMI not supported by compiler")
endif ()
endif ()
# ensure we have the minimum of SSE4.2 - call a SSE4.2 intrinsic
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
int main() {
__m128i a = _mm_set1_epi8(1);
(void)_mm_shuffle_epi8(a, a);
}" HAVE_SSE42)
# now look for AVX2
set(CMAKE_REQUIRED_FLAGS "-mavx2")
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
#if !defined(__AVX2__)
#error no avx2
#endif
int main(){
__m256i z = _mm256_setzero_si256();
(void)_mm256_xor_si256(z, z);
}" HAVE_AVX2)
# and now for AVX512
set(CMAKE_REQUIRED_FLAGS "${SKYLAKE_FLAG}")
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
#if !defined(__AVX512BW__)
#error no avx512bw
#endif
int main(){
__m512i z = _mm512_setzero_si512();
(void)_mm512_abs_epi8(z);
}" HAVE_AVX512)
# and now for AVX512VBMI
set(CMAKE_REQUIRED_FLAGS "${ICELAKE_FLAG}")
CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}>
#if !defined(__AVX512VBMI__)
#error no avx512vbmi
#endif
int main(){
__m512i a = _mm512_set1_epi8(0xFF);
__m512i idx = _mm512_set_epi64(3ULL, 2ULL, 1ULL, 0ULL, 7ULL, 6ULL, 5ULL, 4ULL);
(void)_mm512_permutexvar_epi8(idx, a);
}" HAVE_AVX512VBMI)
if (FAT_RUNTIME)
if (NOT HAVE_SSE42)
message(FATAL_ERROR "SSE4.2 support required to build fat runtime")
endif ()
if (BUILD_AVX2 AND NOT HAVE_AVX2)
message(FATAL_ERROR "AVX2 support required to build fat runtime")
endif ()
if (BUILD_AVX512 AND NOT HAVE_AVX512)
message(FATAL_ERROR "AVX512 support requested but not supported")
endif ()
if (BUILD_AVX512VBMI AND NOT HAVE_AVX512VBMI)
message(FATAL_ERROR "AVX512VBMI support requested but not supported")
endif ()
else (NOT FAT_RUNTIME)
if (NOT BUILD_AVX2)
message(STATUS "Building without AVX2 support")
endif ()
if (NOT HAVE_AVX512)
message(STATUS "Building without AVX512 support")
endif ()
if (NOT HAVE_AVX512VBMI)
message(STATUS "Building without AVX512VBMI support")
endif ()
if (NOT HAVE_SSE42)
message(FATAL_ERROR "A minimum of SSE4.2 compiler support is required")
endif ()
endif ()

20
cmake/compiler.cmake Normal file
View File

@ -0,0 +1,20 @@
# determine compiler
if (CMAKE_CXX_COMPILER_ID MATCHES "Clang")
set(CMAKE_COMPILER_IS_CLANG TRUE)
set(CLANGCXX_MINVER "5")
message(STATUS "clang++ version ${CMAKE_CXX_COMPILER_VERSION}")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS CLANGCXX_MINVER)
message(FATAL_ERROR "A minimum of clang++ ${CLANGCXX_MINVER} is required for C++17 support")
endif()
string (REGEX REPLACE "^([0-9]+)\\.([0-9]+)\\.([0-9]+)$" "\\1" CLANG_MAJOR_VERSION "${CMAKE_CXX_COMPILER_VERSION}")
endif()
# compiler version checks TODO: test more compilers
if (CMAKE_COMPILER_IS_GNUCXX)
set(GNUCXX_MINVER "9")
message(STATUS "g++ version ${CMAKE_CXX_COMPILER_VERSION}")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS GNUCXX_MINVER)
message(FATAL_ERROR "A minimum of g++ ${GNUCXX_MINVER} is required for C++17 support")
endif()
endif()

View File

@ -15,15 +15,42 @@
/* "Define if building for EM64T" */
#cmakedefine ARCH_X86_64
/* "Define if building for ARM32" */
#cmakedefine ARCH_ARM32
/* "Define if building for AARCH64" */
#cmakedefine ARCH_AARCH64
/* "Define if building for PPC64EL" */
#cmakedefine ARCH_PPC64EL
/* "Define if cross compiling for AARCH64" */
#cmakedefine CROSS_COMPILE_AARCH64
/* Define if building SVE for AARCH64. */
#cmakedefine BUILD_SVE
/* Define if building SVE2 for AARCH64. */
#cmakedefine BUILD_SVE2
/* Define if building SVE2+BITPERM for AARCH64. */
#cmakedefine BUILD_SVE2_BITPERM
/* internal build, switch on dump support. */
#cmakedefine DUMP_SUPPORT
/* Define if building "fat" runtime. */
#cmakedefine FAT_RUNTIME
/* Define if building AVX2 in the fat runtime. */
#cmakedefine BUILD_AVX2
/* Define if building AVX-512 in the fat runtime. */
#cmakedefine BUILD_AVX512
/* Define if building AVX512VBMI in the fat runtime. */
#cmakedefine BUILD_AVX512VBMI
/* Define to 1 if `backtrace' works. */
#cmakedefine HAVE_BACKTRACE
@ -45,6 +72,15 @@
/* C compiler has intrin.h */
#cmakedefine HAVE_C_INTRIN_H
/* C compiler has arm_neon.h */
#cmakedefine HAVE_C_ARM_NEON_H
/* C compiler has arm_sve.h */
#cmakedefine HAVE_C_ARM_SVE_H
/* C compiler has arm_neon.h */
#cmakedefine HAVE_C_PPC64EL_ALTIVEC_H
/* Define to 1 if you have the declaration of `pthread_setaffinity_np', and to
0 if you don't. */
#cmakedefine HAVE_DECL_PTHREAD_SETAFFINITY_NP

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python
from __future__ import print_function
import os
import sys
import datetime

54
cmake/osdetection.cmake Normal file
View File

@ -0,0 +1,54 @@
if(CMAKE_SYSTEM_NAME MATCHES "Linux")
set(LINUX TRUE)
endif(CMAKE_SYSTEM_NAME MATCHES "Linux")
if(CMAKE_SYSTEM_NAME MATCHES "FreeBSD")
set(FREEBSD true)
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
#FIXME: find a nicer and more general way of doing this
if(CMAKE_C_COMPILER MATCHES "/usr/local/bin/gcc13")
set(CMAKE_BUILD_RPATH "/usr/local/lib/gcc13")
elseif(ARCH_AARCH64 AND (CMAKE_C_COMPILER MATCHES "/usr/local/bin/gcc12"))
set(CMAKE_BUILD_RPATH "/usr/local/lib/gcc12")
endif()
endif(CMAKE_SYSTEM_NAME MATCHES "FreeBSD")
if(CMAKE_SYSTEM_NAME MATCHES "NetBSD")
set(NETBSD true)
endif(CMAKE_SYSTEM_NAME MATCHES "NetBSD")
if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
set(MACOSX TRUE)
endif()
if (ARCH_IA32 OR ARCH_X86_64)
option(FAT_RUNTIME "Build a library that supports multiple microarchitectures" ON)
else()
option(FAT_RUNTIME "Build a library that supports multiple microarchitectures" OFF)
endif()
if (FAT_RUNTIME)
message("Checking Fat Runtime Requirements...")
if (USE_CPU_NATIVE AND FAT_RUNTIME)
message(FATAL_ERROR "Fat runtime is not compatible with Native CPU detection")
endif()
if (NOT (ARCH_IA32 OR ARCH_X86_64 OR ARCH_AARCH64))
message(FATAL_ERROR "Fat runtime is only supported on Intel and Aarch64 architectures")
else()
message(STATUS "Building Fat runtime for multiple microarchitectures")
message(STATUS "generator is ${CMAKE_GENERATOR}")
if (NOT (CMAKE_GENERATOR MATCHES "Unix Makefiles" OR
(CMAKE_VERSION VERSION_GREATER "3.0" AND CMAKE_GENERATOR MATCHES "Ninja")))
message (FATAL_ERROR "Building the fat runtime requires the Unix Makefiles generator, or Ninja with CMake v3.0 or higher")
else()
include (${CMAKE_MODULE_PATH}/attrib.cmake)
if (NOT HAS_C_ATTR_IFUNC)
message(FATAL_ERROR "Compiler does not support ifunc attribute, cannot build fat runtime")
endif()
endif()
endif()
if (NOT RELEASE_BUILD)
message(FATAL_ERROR "Fat runtime is only built on Release builds")
endif()
endif ()

View File

@ -30,7 +30,7 @@ if (PCRE_BUILD_SOURCE)
#if PCRE_MAJOR != ${PCRE_REQUIRED_MAJOR_VERSION} || PCRE_MINOR < ${PCRE_REQUIRED_MINOR_VERSION}
#error Incorrect pcre version
#endif
main() {}" CORRECT_PCRE_VERSION)
int main(void) {return 0;}" CORRECT_PCRE_VERSION)
set (CMAKE_REQUIRED_INCLUDES "${saved_INCLUDES}")
if (NOT CORRECT_PCRE_VERSION)

View File

@ -1,9 +1,12 @@
# determine the target arch
# really only interested in the preprocessor here
CHECK_C_SOURCE_COMPILES("#if !(defined(__x86_64__) || defined(_M_X64))\n#error not 64bit\n#endif\nint main(void) { return 0; }" ARCH_64_BIT)
CHECK_C_SOURCE_COMPILES("#if !(defined(__i386__) || defined(_M_IX86))\n#error not 64bit\n#endif\nint main(void) { return 0; }" ARCH_32_BIT)
set(ARCH_X86_64 ${ARCH_64_BIT})
set(ARCH_IA32 ${ARCH_32_BIT})
CHECK_C_SOURCE_COMPILES("#if !(defined(__x86_64__) || defined(_M_X64))\n#error not 64bit\n#endif\nint main(void) { return 0; }" ARCH_X86_64)
CHECK_C_SOURCE_COMPILES("#if !(defined(__i386__) || defined(_M_IX86))\n#error not 32bit\n#endif\nint main(void) { return 0; }" ARCH_IA32)
CHECK_C_SOURCE_COMPILES("#if !defined(__ARM_ARCH_ISA_A64)\n#error not 64bit\n#endif\nint main(void) { return 0; }" ARCH_AARCH64)
CHECK_C_SOURCE_COMPILES("#if !defined(__ARM_ARCH_ISA_ARM)\n#error not 32bit\n#endif\nint main(void) { return 0; }" ARCH_ARM32)
CHECK_C_SOURCE_COMPILES("#if !defined(__PPC64__) && !(defined(__LITTLE_ENDIAN__) && defined(__VSX__))\n#error not ppc64el\n#endif\nint main(void) { return 0; }" ARCH_PPC64EL)
if (ARCH_X86_64 OR ARCH_AARCH64 OR ARCH_PPC64EL)
set(ARCH_64_BIT TRUE)
else()
set(ARCH_32_BIT TRUE)
endif()

View File

@ -7,7 +7,7 @@ function(ragelmaker src_rl)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${src_dir}/${src_file}.cpp
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/${src_dir}
COMMAND ${RAGEL} ${CMAKE_CURRENT_SOURCE_DIR}/${src_rl} -o ${rl_out}
COMMAND ${RAGEL} ${CMAKE_CURRENT_SOURCE_DIR}/${src_rl} -o ${rl_out} -G0
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/${src_rl}
)
add_custom_target(ragel_${src_file} DEPENDS ${rl_out})

40
cmake/sanitize.cmake Normal file
View File

@ -0,0 +1,40 @@
# Possible values:
# - `address` (ASan)
# - `memory` (MSan)
# - `undefined` (UBSan)
# - "" (no sanitizing)
option (SANITIZE "Enable one of the code sanitizers" "")
set (SAN_FLAGS "${SAN_FLAGS} -g -fno-omit-frame-pointer -DSANITIZER")
if (SANITIZE)
if (SANITIZE STREQUAL "address")
set (ASAN_FLAGS "-fsanitize=address -fsanitize-address-use-after-scope")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${ASAN_FLAGS}")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${ASAN_FLAGS}")
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${ASAN_FLAGS}")
endif()
elseif (SANITIZE STREQUAL "memory")
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (FATAL_ERROR "GCC does not have memory sanitizer")
endif()
# MemorySanitizer flags are set according to the official documentation:
# https://clang.llvm.org/docs/MemorySanitizer.html#usage
set (MSAN_FLAGS "-fsanitize=memory -fsanitize-memory-use-after-dtor -fsanitize-memory-track-origins -fno-optimize-sibling-calls")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${MSAN_FLAGS}")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${MSAN_FLAGS}")
elseif (SANITIZE STREQUAL "undefined")
set (UBSAN_FLAGS "-fsanitize=undefined")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}")
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=undefined")
endif()
else ()
message (FATAL_ERROR "Unknown sanitizer type: ${SANITIZE}")
endif ()
endif()

40
cmake/simde.cmake Normal file
View File

@ -0,0 +1,40 @@
LIST(APPEND CMAKE_REQUIRED_INCLUDES ${PROJECT_SOURCE_DIR}/simde)
CHECK_INCLUDE_FILES(simde/x86/sse4.2.h SIMDE_SSE42_H_FOUND)
if (SIMDE_SSE42_H_FOUND)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DVS_SIMDE_BACKEND")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DVS_SIMDE_BACKEND")
include_directories(${PROJECT_SOURCE_DIR}/simde)
if (CMAKE_COMPILER_IS_CLANG)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DSIMDE_NO_CHECK_IMMEDIATE_CONSTANT")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DSIMDE_NO_CHECK_IMMEDIATE_CONSTANT")
if (ARCH_PPC64EL)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-deprecated-altivec-src-compat")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-deprecated-altivec-src-compat")
if (CLANG_MAJOR_VERSION EQUAL 15)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-deprecate-lax-vec-conv-all")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-deprecate-lax-vec-conv-all")
endif ()
endif()
endif()
if (BUILD_SSE2_SIMDE)
message("using BUILD_SSE2_SIMDE..")
set(SIMDE_NATIVE true)
set(ARCH_C_FLAGS "-msse2")
set(ARCH_CXX_FLAGS "-msse2")
set(X86_ARCH "x86-64")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DVS_SIMDE_NATIVE -DVS_SIMDE_BACKEND")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DVS_SIMDE_NATIVE -DVS_SIMDE_BACKEND")
endif()
if (SIMDE_NATIVE AND NOT BUILD_SSE2_SIMDE)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DVS_SIMDE_NATIVE -DSIMDE_ENABLE_OPENMP -fopenmp-simd")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DVS_SIMDE_NATIVE -DSIMDE_ENABLE_OPENMP -fopenmp-simd")
endif()
else()
message(FATAL_ERROR "SIMDe backend requested but SIMDe is not available on the system")
endif()

View File

@ -1,53 +1,19 @@
#
# a lot of noise to find sqlite
# sqlite is only used in hsbench, no need to special case its build, depend only on OS installations using pkg-config
#
option(SQLITE_PREFER_STATIC "Build sqlite3 statically instead of using an installed lib" OFF)
if(NOT WIN32 AND NOT SQLITE_PREFER_STATIC)
find_package(PkgConfig QUIET)
# first check for sqlite on the system
pkg_check_modules(SQLITE3 sqlite3)
endif()
if (NOT SQLITE3_FOUND)
message(STATUS "looking for sqlite3 in source tree")
# look in the source tree
if (EXISTS "${PROJECT_SOURCE_DIR}/sqlite3/sqlite3.h" AND
EXISTS "${PROJECT_SOURCE_DIR}/sqlite3/sqlite3.c")
message(STATUS " found sqlite3 in source tree")
set(SQLITE3_FOUND TRUE)
set(SQLITE3_BUILD_SOURCE TRUE)
set(SQLITE3_INCLUDE_DIRS "${PROJECT_SOURCE_DIR}/sqlite3")
set(SQLITE3_LDFLAGS sqlite3_static)
else()
message(STATUS " no sqlite3 in source tree")
endif()
endif()
# now do version checks
if (SQLITE3_FOUND)
list(INSERT CMAKE_REQUIRED_INCLUDES 0 "${SQLITE3_INCLUDE_DIRS}")
CHECK_C_SOURCE_COMPILES("#include <sqlite3.h>\n#if SQLITE_VERSION_NUMBER >= 3008007 && SQLITE_VERSION_NUMBER < 3008010\n#error broken sqlite\n#endif\nint main() {return 0;}" SQLITE_VERSION_OK)
if (NOT SQLITE_VERSION_OK)
if (SQLITE_VERSION LESS "3.8.10")
message(FATAL_ERROR "sqlite3 is broken from 3.8.7 to 3.8.10 - please find a working version")
endif()
if (NOT SQLITE3_BUILD_SOURCE)
set(_SAVED_FLAGS ${CMAKE_REQUIRED_FLAGS})
list(INSERT CMAKE_REQUIRED_LIBRARIES 0 ${SQLITE3_LDFLAGS})
CHECK_SYMBOL_EXISTS(sqlite3_open_v2 sqlite3.h HAVE_SQLITE3_OPEN_V2)
list(REMOVE_ITEM CMAKE_REQUIRED_INCLUDES "${SQLITE3_INCLUDE_DIRS}")
list(REMOVE_ITEM CMAKE_REQUIRED_LIBRARIES ${SQLITE3_LDFLAGS})
else()
if (NOT TARGET sqlite3_static)
# build sqlite as a static lib to compile into our test programs
add_library(sqlite3_static STATIC "${PROJECT_SOURCE_DIR}/sqlite3/sqlite3.c")
if (NOT WIN32)
set_target_properties(sqlite3_static PROPERTIES COMPILE_FLAGS "-Wno-error -Wno-extra -Wno-unused -Wno-cast-qual -DSQLITE_OMIT_LOAD_EXTENSION")
endif()
endif()
endif()
endif()
# that's enough about sqlite

View File

@ -0,0 +1,15 @@
unknownMacro:*gtest-all.cc
knownConditionTrueFalse:*Parser.rl
knownConditionTrueFalse:*Parser.cpp
variableScope:*Parser.rl
duplicateBreak:*.rl
unreadVariable:*control_verbs.cpp
unreachableCode:*rose_build_dump.cpp
*:*simde/*
assertWithSideEffect
syntaxError
internalError
checkersReport
missingInclude
missingIncludeSystem
unmatchedSuppression

View File

@ -19,6 +19,7 @@ else()
set(SPHINX_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/_build")
set(SPHINX_CACHE_DIR "${CMAKE_CURRENT_BINARY_DIR}/_doctrees")
set(SPHINX_HTML_DIR "${CMAKE_CURRENT_BINARY_DIR}/html")
set(SPHINX_MAN_DIR "${CMAKE_CURRENT_BINARY_DIR}/man")
configure_file("${CMAKE_CURRENT_SOURCE_DIR}/conf.py.in"
"${CMAKE_CURRENT_BINARY_DIR}/conf.py" @ONLY)
@ -32,4 +33,14 @@ add_custom_target(dev-reference
"${SPHINX_HTML_DIR}"
DEPENDS dev-reference-doxygen
COMMENT "Building HTML dev reference with Sphinx")
add_custom_target(dev-reference-man
${SPHINX_BUILD}
-b man
-c "${CMAKE_CURRENT_BINARY_DIR}"
-d "${SPHINX_CACHE_DIR}"
"${CMAKE_CURRENT_SOURCE_DIR}"
"${SPHINX_MAN_DIR}"
DEPENDS dev-reference-doxygen
COMMENT "Building man page reference with Sphinx")
endif()

View File

@ -11,10 +11,10 @@ Introduction
************
Chimera is a software regular expression matching engine that is a hybrid of
Hyperscan and PCRE. The design goals of Chimera are to fully support PCRE
syntax as well as to take advantage of the high performance nature of Hyperscan.
Vectorscan and PCRE. The design goals of Chimera are to fully support PCRE
syntax as well as to take advantage of the high performance nature of Vectorscan.
Chimera inherits the design guideline of Hyperscan with C APIs for compilation
Chimera inherits the design guideline of Vectorscan with C APIs for compilation
and scanning.
The Chimera API itself is composed of two major components:
@ -65,13 +65,13 @@ For a given database, Chimera provides several guarantees:
.. note:: Chimera is designed to have the same matching behavior as PCRE,
including greedy/ungreedy, capturing, etc. Chimera reports both
**start offset** and **end offset** for each match like PCRE. Different
from the fashion of reporting all matches in Hyperscan, Chimera only reports
from the fashion of reporting all matches in Vectorscan, Chimera only reports
non-overlapping matches. For example, the pattern :regexp:`/foofoo/` will
match ``foofoofoofoo`` at offsets (0, 6) and (6, 12).
.. note:: Since Chimera is a hybrid of Hyperscan and PCRE in order to support
.. note:: Since Chimera is a hybrid of Vectorscan and PCRE in order to support
full PCRE syntax, there will be extra performance overhead compared to
Hyperscan-only solution. Please always use Hyperscan for better performance
Vectorscan-only solution. Please always use Vectorscan for better performance
unless you must need full PCRE syntax support.
See :ref:`chruntime` for more details
@ -83,12 +83,12 @@ Requirements
The PCRE library (http://pcre.org/) version 8.41 is required for Chimera.
.. note:: Since Chimera needs to reference PCRE internal function, please place PCRE source
directory under Hyperscan root directory in order to build Chimera.
directory under Vectorscan root directory in order to build Chimera.
Beside this, both hardware and software requirements of Chimera are the same to Hyperscan.
Beside this, both hardware and software requirements of Chimera are the same to Vectorscan.
See :ref:`hardware` and :ref:`software` for more details.
.. note:: Building Hyperscan will automatically generate Chimera library.
.. note:: Building Vectorscan will automatically generate Chimera library.
Currently only static library is supported for Chimera, so please
use static build type when configure CMake build options.
@ -119,7 +119,7 @@ databases:
Compilation allows the Chimera library to analyze the given pattern(s) and
pre-determine how to scan for these patterns in an optimized fashion using
Hyperscan and PCRE.
Vectorscan and PCRE.
===============
Pattern Support
@ -134,7 +134,7 @@ Semantics
=========
Chimera supports the exact same semantics of PCRE library. Moreover, it supports
multiple simultaneous pattern matching like Hyperscan and the multiple matches
multiple simultaneous pattern matching like Vectorscan and the multiple matches
will be reported in order by end offset.
.. _chruntime:
@ -212,7 +212,7 @@ space is required for that context.
In the absence of recursive scanning, only one such space is required per thread
and can (and indeed should) be allocated before data scanning is to commence.
In a scenario where a set of expressions are compiled by a single "master"
In a scenario where a set of expressions are compiled by a single "main"
thread and data will be scanned by multiple "worker" threads, the convenience
function :c:func:`ch_clone_scratch` allows multiple copies of an existing
scratch space to be made for each thread (rather than forcing the caller to pass

View File

@ -9,7 +9,7 @@ Compiling Patterns
Building a Database
*******************
The Hyperscan compiler API accepts regular expressions and converts them into a
The Vectorscan compiler API accepts regular expressions and converts them into a
compiled pattern database that can then be used to scan data.
The API provides three functions that compile regular expressions into
@ -24,7 +24,7 @@ databases:
#. :c:func:`hs_compile_ext_multi`: compiles an array of expressions as above,
but allows :ref:`extparam` to be specified for each expression.
Compilation allows the Hyperscan library to analyze the given pattern(s) and
Compilation allows the Vectorscan library to analyze the given pattern(s) and
pre-determine how to scan for these patterns in an optimized fashion that would
be far too expensive to compute at run-time.
@ -48,10 +48,10 @@ To compile patterns to be used in streaming mode, the ``mode`` parameter of
block mode requires the use of :c:member:`HS_MODE_BLOCK` and vectored mode
requires the use of :c:member:`HS_MODE_VECTORED`. A pattern database compiled
for one mode (streaming, block or vectored) can only be used in that mode. The
version of Hyperscan used to produce a compiled pattern database must match the
version of Hyperscan used to scan with it.
version of Vectorscan used to produce a compiled pattern database must match the
version of Vectorscan used to scan with it.
Hyperscan provides support for targeting a database at a particular CPU
Vectorscan provides support for targeting a database at a particular CPU
platform; see :ref:`instr_specialization` for details.
=====================
@ -64,25 +64,25 @@ interpreted independently. No syntax association happens between any adjacent
characters.
For example, given an expression written as :regexp:`/bc?/`. We could say it is
a regluar expression, with the meaning that character ``b`` followed by nothing
a regular expression, with the meaning that character ``b`` followed by nothing
or by one character ``c``. On the other view, we could also say it is a pure
literal expression, with the meaning that this is a character sequence of 3-byte
length, containing characters ``b``, ``c`` and ``?``. In regular case, the
question mark character ``?`` has a particular syntax role called 0-1 quantifier,
which has an syntax association with the character ahead of it. Similar
characters exist in regular grammer like ``[``, ``]``, ``(``, ``)``, ``{``,
which has a syntax association with the character ahead of it. Similar
characters exist in regular grammar like ``[``, ``]``, ``(``, ``)``, ``{``,
``}``, ``-``, ``*``, ``+``, ``\``, ``|``, ``/``, ``:``, ``^``, ``.``, ``$``.
While in pure literal case, all these meta characters lost extra meanings
expect for that they are just common ASCII codes.
Hyperscan is initially designed to process common regular expressions. It is
hence embedded with a complex parser to do comprehensive regular grammer
interpretion. Particularly, the identification of above meta characters is the
basic step for the interpretion of far more complex regular grammers.
Vectorscan is initially designed to process common regular expressions. It is
hence embedded with a complex parser to do comprehensive regular grammar
interpretation. Particularly, the identification of above meta characters is the
basic step for the interpretation of far more complex regular grammars.
However in real cases, patterns may not always be regular expressions. They
could just be pure literals. Problem will come if the pure literals contain
regular meta characters. Supposing fed directly into traditional Hyperscan
regular meta characters. Supposing fed directly into traditional Vectorscan
compile API, all these meta characters will be interpreted in predefined ways,
which is unnecessary and the result is totally out of expectation. To avoid
such misunderstanding by traditional API, users have to preprocess these
@ -90,7 +90,7 @@ literal patterns by converting the meta characters into some other formats:
either by adding a backslash ``\`` before certain meta characters, or by
converting all the characters into a hexadecimal representation.
In ``v5.2.0``, Hyperscan introduces 2 new compile APIs for pure literal patterns:
In ``v5.2.0``, Vectorscan introduces 2 new compile APIs for pure literal patterns:
#. :c:func:`hs_compile_lit`: compiles a single pure literal into a pattern
database.
@ -106,7 +106,7 @@ content directly into these APIs without worrying about writing regular meta
characters in their patterns. No preprocessing work is needed any more.
For new APIs, the ``length`` of each literal pattern is a newly added parameter.
Hyperscan needs to locate the end position of the input expression via clearly
Vectorscan needs to locate the end position of the input expression via clearly
knowing each literal's length, not by simply identifying character ``\0`` of a
string.
@ -127,19 +127,19 @@ Supported flags: :c:member:`HS_FLAG_CASELESS`, :c:member:`HS_FLAG_SINGLEMATCH`,
Pattern Support
***************
Hyperscan supports the pattern syntax used by the PCRE library ("libpcre"),
Vectorscan supports the pattern syntax used by the PCRE library ("libpcre"),
described at <http://www.pcre.org/>. However, not all constructs available in
libpcre are supported. The use of unsupported constructs will result in
compilation errors.
The version of PCRE used to validate Hyperscan's interpretation of this syntax
The version of PCRE used to validate Vectorscan's interpretation of this syntax
is 8.41 or above.
====================
Supported Constructs
====================
The following regex constructs are supported by Hyperscan:
The following regex constructs are supported by Vectorscan:
* Literal characters and strings, with all libpcre quoting and character
escapes.
@ -165,7 +165,7 @@ The following regex constructs are supported by Hyperscan:
:regexp:`{n,}` are supported with limitations.
* For arbitrary repeated sub-patterns: *n* and *m* should be either small
or infinite, e.g. :regexp:`(a|b}{4}`, :regexp:`(ab?c?d){4,10}` or
or infinite, e.g. :regexp:`(a|b){4}`, :regexp:`(ab?c?d){4,10}` or
:regexp:`(ab(cd)*){6,}`.
* For single-character width sub-patterns such as :regexp:`[^\\a]` or
@ -177,7 +177,7 @@ The following regex constructs are supported by Hyperscan:
:c:member:`HS_FLAG_SINGLEMATCH` flag is on for that pattern.
* Lazy modifiers (:regexp:`?` appended to another quantifier, e.g.
:regexp:`\\w+?`) are supported but ignored (as Hyperscan reports all
:regexp:`\\w+?`) are supported but ignored (as Vectorscan reports all
matches).
* Parenthesization, including the named and unnamed capturing and
@ -219,15 +219,15 @@ The following regex constructs are supported by Hyperscan:
.. note:: At this time, not all patterns can be successfully compiled with the
:c:member:`HS_FLAG_SOM_LEFTMOST` flag, which enables per-pattern support for
:ref:`som`. The patterns that support this flag are a subset of patterns that
can be successfully compiled with Hyperscan; notably, many bounded repeat
forms that can be compiled with Hyperscan without the Start of Match flag
can be successfully compiled with Vectorscan; notably, many bounded repeat
forms that can be compiled with Vectorscan without the Start of Match flag
enabled cannot be compiled with the flag enabled.
======================
Unsupported Constructs
======================
The following regex constructs are not supported by Hyperscan:
The following regex constructs are not supported by Vectorscan:
* Backreferences and capturing sub-expressions.
* Arbitrary zero-width assertions.
@ -246,32 +246,32 @@ The following regex constructs are not supported by Hyperscan:
Semantics
*********
While Hyperscan follows libpcre syntax, it provides different semantics. The
While Vectorscan follows libpcre syntax, it provides different semantics. The
major departures from libpcre semantics are motivated by the requirements of
streaming and multiple simultaneous pattern matching.
The major departures from libpcre semantics are:
#. **Multiple pattern matching**: Hyperscan allows matches to be reported for
#. **Multiple pattern matching**: Vectorscan allows matches to be reported for
several patterns simultaneously. This is not equivalent to separating the
patterns by :regexp:`|` in libpcre, which evaluates alternations
left-to-right.
#. **Lack of ordering**: the multiple matches that Hyperscan produces are not
#. **Lack of ordering**: the multiple matches that Vectorscan produces are not
guaranteed to be ordered, although they will always fall within the bounds of
the current scan.
#. **End offsets only**: Hyperscan's default behaviour is only to report the end
#. **End offsets only**: Vectorscan's default behaviour is only to report the end
offset of a match. Reporting of the start offset can be enabled with
per-expression flags at pattern compile time. See :ref:`som` for details.
#. **"All matches" reported**: scanning :regexp:`/foo.*bar/` against
``fooxyzbarbar`` will return two matches from Hyperscan -- at the points
``fooxyzbarbar`` will return two matches from Vectorscan -- at the points
corresponding to the ends of ``fooxyzbar`` and ``fooxyzbarbar``. In contrast,
libpcre semantics by default would report only one match at ``fooxyzbarbar``
(greedy semantics) or, if non-greedy semantics were switched on, one match at
``fooxyzbar``. This means that switching between greedy and non-greedy
semantics is a no-op in Hyperscan.
semantics is a no-op in Vectorscan.
To support libpcre quantifier semantics while accurately reporting streaming
matches at the time they occur is impossible. For example, consider the pattern
@ -299,7 +299,7 @@ as in block 3 -- which would constitute a better match for the pattern.
Start of Match
==============
In standard operation, Hyperscan will only provide the end offset of a match
In standard operation, Vectorscan will only provide the end offset of a match
when the match callback is called. If the :c:member:`HS_FLAG_SOM_LEFTMOST` flag
is specified for a particular pattern, then the same set of matches is
returned, but each match will also provide the leftmost possible start offset
@ -308,7 +308,7 @@ corresponding to its end offset.
Using the SOM flag entails a number of trade-offs and limitations:
* Reduced pattern support: For many patterns, tracking SOM is complex and can
result in Hyperscan failing to compile a pattern with a "Pattern too
result in Vectorscan failing to compile a pattern with a "Pattern too
large" error, even if the pattern is supported in normal operation.
* Increased stream state: At scan time, state space is required to track
potential SOM offsets, and this must be stored in persistent stream state in
@ -316,20 +316,20 @@ Using the SOM flag entails a number of trade-offs and limitations:
required to match a pattern.
* Performance overhead: Similarly, there is generally a performance cost
associated with tracking SOM.
* Incompatible features: Some other Hyperscan pattern flags (such as
* Incompatible features: Some other Vectorscan pattern flags (such as
:c:member:`HS_FLAG_SINGLEMATCH` and :c:member:`HS_FLAG_PREFILTER`) can not be
used in combination with SOM. Specifying them together with
:c:member:`HS_FLAG_SOM_LEFTMOST` will result in a compilation error.
In streaming mode, the amount of precision delivered by SOM can be controlled
with the SOM horizon flags. These instruct Hyperscan to deliver accurate SOM
with the SOM horizon flags. These instruct Vectorscan to deliver accurate SOM
information within a certain distance of the end offset, and return a special
start offset of :c:member:`HS_OFFSET_PAST_HORIZON` otherwise. Specifying a
small or medium SOM horizon will usually reduce the stream state required for a
given database.
.. note:: In streaming mode, the start offset returned for a match may refer to
a point in the stream *before* the current block being scanned. Hyperscan
a point in the stream *before* the current block being scanned. Vectorscan
provides no facility for accessing earlier blocks; if the calling application
needs to inspect historical data, then it must store it itself.
@ -341,7 +341,7 @@ Extended Parameters
In some circumstances, more control over the matching behaviour of a pattern is
required than can be specified easily using regular expression syntax. For
these scenarios, Hyperscan provides the :c:func:`hs_compile_ext_multi` function
these scenarios, Vectorscan provides the :c:func:`hs_compile_ext_multi` function
that allows a set of "extended parameters" to be set on a per-pattern basis.
Extended parameters are specified using an :c:type:`hs_expr_ext_t` structure,
@ -383,18 +383,18 @@ section.
Prefiltering Mode
=================
Hyperscan provides a per-pattern flag, :c:member:`HS_FLAG_PREFILTER`, which can
be used to implement a prefilter for a pattern than Hyperscan would not
Vectorscan provides a per-pattern flag, :c:member:`HS_FLAG_PREFILTER`, which can
be used to implement a prefilter for a pattern than Vectorscan would not
ordinarily support.
This flag instructs Hyperscan to compile an "approximate" version of this
pattern for use in a prefiltering application, even if Hyperscan does not
This flag instructs Vectorscan to compile an "approximate" version of this
pattern for use in a prefiltering application, even if Vectorscan does not
support the pattern in normal operation.
The set of matches returned when this flag is used is guaranteed to be a
superset of the matches specified by the non-prefiltering expression.
If the pattern contains pattern constructs not supported by Hyperscan (such as
If the pattern contains pattern constructs not supported by Vectorscan (such as
zero-width assertions, back-references or conditional references) these
constructs will be replaced internally with broader constructs that may match
more often.
@ -404,7 +404,7 @@ back-reference :regexp:`\\1`. In prefiltering mode, this pattern might be
approximated by having its back-reference replaced with its referent, forming
:regexp:`/\\w+ again \\w+/`.
Furthermore, in prefiltering mode Hyperscan may simplify a pattern that would
Furthermore, in prefiltering mode Vectorscan may simplify a pattern that would
otherwise return a "Pattern too large" error at compile time, or for performance
reasons (subject to the matching guarantee above).
@ -422,22 +422,22 @@ matches for the pattern.
Instruction Set Specialization
******************************
Hyperscan is able to make use of several modern instruction set features found
Vectorscan is able to make use of several modern instruction set features found
on x86 processors to provide improvements in scanning performance.
Some of these features are selected when the library is built; for example,
Hyperscan will use the native ``POPCNT`` instruction on processors where it is
Vectorscan will use the native ``POPCNT`` instruction on processors where it is
available and the library has been optimized for the host architecture.
.. note:: By default, the Hyperscan runtime is built with the ``-march=native``
.. note:: By default, the Vectorscan runtime is built with the ``-march=native``
compiler flag and (where possible) will make use of all instructions known by
the host's C compiler.
To use some instruction set features, however, Hyperscan must build a
To use some instruction set features, however, Vectorscan must build a
specialized database to support them. This means that the target platform must
be specified at pattern compile time.
The Hyperscan compiler API functions all accept an optional
The Vectorscan compiler API functions all accept an optional
:c:type:`hs_platform_info_t` argument, which describes the target platform
for the database to be built. If this argument is NULL, the database will be
targeted at the current host platform.
@ -467,7 +467,7 @@ See :ref:`api_constants` for the full list of CPU tuning and feature flags.
Approximate matching
********************
Hyperscan provides an experimental approximate matching mode, which will match
Vectorscan provides an experimental approximate matching mode, which will match
patterns within a given edit distance. The exact matching behavior is defined as
follows:
@ -492,7 +492,7 @@ follows:
Here are a few examples of approximate matching:
* Pattern :regexp:`/foo/` can match ``foo`` when using regular Hyperscan
* Pattern :regexp:`/foo/` can match ``foo`` when using regular Vectorscan
matching behavior. With approximate matching within edit distance 2, the
pattern will produce matches when scanned against ``foo``, ``foooo``, ``f00``,
``f``, and anything else that lies within edit distance 2 of matching corpora
@ -513,7 +513,7 @@ matching support. Here they are, in a nutshell:
* Reduced pattern support:
* For many patterns, approximate matching is complex and can result in
Hyperscan failing to compile a pattern with a "Pattern too large" error,
Vectorscan failing to compile a pattern with a "Pattern too large" error,
even if the pattern is supported in normal operation.
* Additionally, some patterns cannot be approximately matched because they
reduce to so-called "vacuous" patterns (patterns that match everything). For
@ -548,7 +548,7 @@ Logical Combinations
********************
For situations when a user requires behaviour that depends on the presence or
absence of matches from groups of patterns, Hyperscan provides support for the
absence of matches from groups of patterns, Vectorscan provides support for the
logical combination of patterns in a given pattern set, with three operators:
``NOT``, ``AND`` and ``OR``.
@ -561,7 +561,7 @@ offset is *true* if the expression it refers to is *false* at this offset.
For example, ``NOT 101`` means that expression 101 has not yet matched at this
offset.
A logical combination is passed to Hyperscan at compile time as an expression.
A logical combination is passed to Vectorscan at compile time as an expression.
This combination expression will raise matches at every offset where one of its
sub-expressions matches and the logical value of the whole expression is *true*.
@ -603,7 +603,7 @@ In a logical combination expression:
* Whitespace is ignored.
To use a logical combination expression, it must be passed to one of the
Hyperscan compile functions (:c:func:`hs_compile_multi`,
Vectorscan compile functions (:c:func:`hs_compile_multi`,
:c:func:`hs_compile_ext_multi`) along with the :c:member:`HS_FLAG_COMBINATION` flag,
which identifies the pattern as a logical combination expression. The patterns
referred to in the logical combination expression must be compiled together in
@ -613,7 +613,7 @@ When an expression has the :c:member:`HS_FLAG_COMBINATION` flag set, it ignores
all other flags except the :c:member:`HS_FLAG_SINGLEMATCH` flag and the
:c:member:`HS_FLAG_QUIET` flag.
Hyperscan will accept logical combination expressions at compile time that
Vectorscan will accept logical combination expressions at compile time that
evaluate to *true* when no patterns have matched, and report the match for
combination at end of data if no patterns have matched; for example: ::

View File

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Hyperscan documentation build configuration file, created by
# Vectorscan documentation build configuration file, created by
# sphinx-quickstart on Tue Sep 29 15:59:19 2015.
#
# This file is execfile()d with the current directory set to its
@ -43,8 +43,8 @@ source_suffix = '.rst'
master_doc = 'index'
# General information about the project.
project = u'Hyperscan'
copyright = u'2015-2018, Intel Corporation'
project = u'Vectorscan'
copyright = u'2015-2020, Intel Corporation; 2020-2024, VectorCamp; and other contributors'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
@ -202,7 +202,7 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
('index', 'Hyperscan.tex', u'Hyperscan Documentation',
('index', 'Hyperscan.tex', u'Vectorscan Documentation',
u'Intel Corporation', 'manual'),
]
@ -232,8 +232,8 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'hyperscan', u'Hyperscan Documentation',
[u'Intel Corporation'], 1)
('index', 'vectorscan', u'Vectorscan Documentation',
[u'Intel Corporation'], 7)
]
# If true, show URL addresses after external links.
@ -246,8 +246,8 @@ man_pages = [
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'Hyperscan', u'Hyperscan Documentation',
u'Intel Corporation', 'Hyperscan', 'High-performance regular expression matcher.',
('index', 'Vectorscan', u'Vectorscan Documentation',
u'Intel Corporation; VectorCamp', 'Vectorscan', 'High-performance regular expression matcher.',
'Miscellaneous'),
]

View File

@ -7,43 +7,41 @@ Getting Started
Very Quick Start
****************
#. Clone Hyperscan ::
#. Clone Vectorscan ::
cd <where-you-want-hyperscan-source>
git clone git://github.com/intel/hyperscan
cd <where-you-want-vectorscan-source>
git clone https://github.com/VectorCamp/vectorscan
#. Configure Hyperscan
#. Configure Vectorscan
Ensure that you have the correct :ref:`dependencies <software>` present,
and then:
::
cd <where-you-want-to-build-hyperscan>
cd <where-you-want-to-build-vectorscan>
mkdir <build-dir>
cd <build-dir>
cmake [-G <generator>] [options] <hyperscan-source-path>
cmake [-G <generator>] [options] <vectorscan-source-path>
Known working generators:
* ``Unix Makefiles`` --- make-compatible makefiles (default on Linux/FreeBSD/Mac OS X)
* ``Ninja`` --- `Ninja <http://martine.github.io/ninja/>`_ build files.
* ``Visual Studio 15 2017`` --- Visual Studio projects
Generators that might work include:
Unsupported generators that might work include:
* ``Xcode`` --- OS X Xcode projects.
#. Build Hyperscan
#. Build Vectorscan
Depending on the generator used:
* ``cmake --build .`` --- will build everything
* ``make -j<jobs>`` --- use makefiles in parallel
* ``ninja`` --- use Ninja build
* ``MsBuild.exe`` --- use Visual Studio MsBuild
* etc.
#. Check Hyperscan
#. Check Vectorscan
Run the Hyperscan unit tests: ::
Run the Vectorscan unit tests: ::
bin/unit-hyperscan
@ -55,20 +53,23 @@ Requirements
Hardware
========
Hyperscan will run on x86 processors in 64-bit (Intel\ |reg| 64 Architecture) and
32-bit (IA-32 Architecture) modes.
Vectorscan will run on x86 processors in 64-bit (Intel\ |reg| 64 Architecture) and
32-bit (IA-32 Architecture) modes as well as Arm v8.0+ aarch64, and POWER 8+ ppc64le
machines.
Hyperscan is a high performance software library that takes advantage of recent
Intel architecture advances. At a minimum, support for Supplemental Streaming
SIMD Extensions 3 (SSSE3) is required, which should be available on any modern
x86 processor.
architecture advances.
Additionally, Hyperscan can make use of:
Additionally, Vectorscan can make use of:
* Intel Streaming SIMD Extensions 4.2 (SSE4.2)
* the POPCNT instruction
* Bit Manipulation Instructions (BMI, BMI2)
* Intel Advanced Vector Extensions 2 (Intel AVX2)
* Arm NEON
* Arm SVE and SVE2
* Arm SVE2 BITPERM
* IBM Power8/Power9 VSX
if present.
@ -79,40 +80,34 @@ These can be determined at library compile time, see :ref:`target_arch`.
Software
========
As a software library, Hyperscan doesn't impose any particular runtime
software requirements, however to build the Hyperscan library we require a
modern C and C++ compiler -- in particular, Hyperscan requires C99 and C++11
As a software library, Vectorscan doesn't impose any particular runtime
software requirements, however to build the Vectorscan library we require a
modern C and C++ compiler -- in particular, Vectorscan requires C99 and C++17
compiler support. The supported compilers are:
* GCC, v4.8.1 or higher
* Clang, v3.4 or higher (with libstdc++ or libc++)
* Intel C++ Compiler v15 or higher
* Visual C++ 2017 Build Tools
* GCC, v9 or higher
* Clang, v5 or higher (with libstdc++ or libc++)
Examples of operating systems that Hyperscan is known to work on include:
Examples of operating systems that Vectorscan is known to work on include:
Linux:
* Ubuntu 14.04 LTS or newer
* Ubuntu 20.04 LTS or newer
* RedHat/CentOS 7 or newer
* Fedora 38 or newer
* Debian 10
FreeBSD:
* 10.0 or newer
Windows:
* 8 or newer
Mac OS X:
* 10.8 or newer, using XCode/Clang
Hyperscan *may* compile and run on other platforms, but there is no guarantee.
We currently have experimental support for Windows using Intel C++ Compiler
or Visual Studio 2017.
Vectorscan *may* compile and run on other platforms, but there is no guarantee.
In addition, the following software is required for compiling the Hyperscan library:
In addition, the following software is required for compiling the Vectorscan library:
======================================================= =========== ======================================
Dependency Version Notes
@ -132,20 +127,20 @@ Ragel, you may use Cygwin to build it from source.
Boost Headers
-------------
Compiling Hyperscan depends on a recent version of the Boost C++ header
Compiling Vectorscan depends on a recent version of the Boost C++ header
library. If the Boost libraries are installed on the build machine in the
usual paths, CMake will find them. If the Boost libraries are not installed,
the location of the Boost source tree can be specified during the CMake
configuration step using the ``BOOST_ROOT`` variable (described below).
Another alternative is to put a copy of (or a symlink to) the boost
subdirectory in ``<hyperscan-source-path>/include/boost``.
subdirectory in ``<vectorscanscan-source-path>/include/boost``.
For example: for the Boost-1.59.0 release: ::
ln -s boost_1_59_0/boost <hyperscan-source-path>/include/boost
ln -s boost_1_59_0/boost <vectorscan-source-path>/include/boost
As Hyperscan uses the header-only parts of Boost, it is not necessary to
As Vectorscan uses the header-only parts of Boost, it is not necessary to
compile the Boost libraries.
CMake Configuration
@ -168,11 +163,12 @@ Common options for CMake include:
| | Valid options are Debug, Release, RelWithDebInfo, |
| | and MinSizeRel. Default is RelWithDebInfo. |
+------------------------+----------------------------------------------------+
| BUILD_SHARED_LIBS | Build Hyperscan as a shared library instead of |
| BUILD_SHARED_LIBS | Build Vectorscan as a shared library instead of |
| | the default static library. |
| | Default: Off |
+------------------------+----------------------------------------------------+
| BUILD_STATIC_AND_SHARED| Build both static and shared Hyperscan libs. |
| | Default off. |
| BUILD_STATIC_LIBS | Build Vectorscan as a static library. |
| | Default: On |
+------------------------+----------------------------------------------------+
| BOOST_ROOT | Location of Boost source tree. |
+------------------------+----------------------------------------------------+
@ -180,12 +176,64 @@ Common options for CMake include:
+------------------------+----------------------------------------------------+
| FAT_RUNTIME | Build the :ref:`fat runtime<fat_runtime>`. Default |
| | true on Linux, not available elsewhere. |
| | Default: Off |
+------------------------+----------------------------------------------------+
| USE_CPU_NATIVE | Native CPU detection is off by default, however it |
| | is possible to build a performance-oriented non-fat|
| | library tuned to your CPU. |
| | Default: Off |
+------------------------+----------------------------------------------------+
| SANITIZE | Use libasan sanitizer to detect possible bugs. |
| | Valid options are address, memory and undefined. |
+------------------------+----------------------------------------------------+
| SIMDE_BACKEND | Enable SIMDe backend. If this is chosen all native |
| | (SSE/AVX/AVX512/Neon/SVE/VSX) backends will be |
| | disabled and a SIMDe SSE4.2 emulation backend will |
| | be enabled. This will enable Vectorscan to build |
| | and run on architectures without SIMD. |
| | Default: Off |
+------------------------+----------------------------------------------------+
| SIMDE_NATIVE | Enable SIMDe native emulation of x86 SSE4.2 |
| | intrinsics on the building platform. That is, |
| | SSE4.2 intrinsics will be emulated using Neon on |
| | an Arm platform, or VSX on a Power platform, etc. |
| | Default: Off |
+------------------------+----------------------------------------------------+
X86 platform specific options include:
+------------------------+----------------------------------------------------+
| Variable | Description |
+========================+====================================================+
| BUILD_AVX2 | Enable code for AVX2. |
+------------------------+----------------------------------------------------+
| BUILD_AVX512 | Enable code for AVX512. Implies BUILD_AVX2. |
+------------------------+----------------------------------------------------+
| BUILD_AVX512VBMI | Enable code for AVX512 with VBMI extension. Implies|
| | BUILD_AVX512. |
+------------------------+----------------------------------------------------+
Arm platform specific options include:
+------------------------+----------------------------------------------------+
| Variable | Description |
+========================+====================================================+
| BUILD_SVE | Enable code for SVE, like on AWS Graviton3 CPUs. |
| | Not much code is ported just for SVE , but enabling|
| | SVE code production, does improve code generation, |
| | see Benchmarks. |
+------------------------+----------------------------------------------------+
| BUILD_SVE2 | Enable code for SVE2, implies BUILD_SVE. Most |
| | non-Neon code is written for SVE2. |
+------------------------+----------------------------------------------------+
| BUILD_SVE2_BITPERM | Enable code for SVE2_BITPERM harwdare feature, |
| | implies BUILD_SVE2. |
+------------------------+----------------------------------------------------+
For example, to generate a ``Debug`` build: ::
cd <build-dir>
cmake -DCMAKE_BUILD_TYPE=Debug <hyperscan-source-path>
cmake -DCMAKE_BUILD_TYPE=Debug <vectorscan-source-path>
@ -193,7 +241,7 @@ Build Type
----------
CMake determines a number of features for a build based on the Build Type.
Hyperscan defaults to ``RelWithDebInfo``, i.e. "release with debugging
Vectorscan defaults to ``RelWithDebInfo``, i.e. "release with debugging
information". This is a performance optimized build without runtime assertions
but with debug symbols enabled.
@ -201,7 +249,7 @@ The other types of builds are:
* ``Release``: as above, but without debug symbols
* ``MinSizeRel``: a stripped release build
* ``Debug``: used when developing Hyperscan. Includes runtime assertions
* ``Debug``: used when developing Vectorscan. Includes runtime assertions
(which has a large impact on runtime performance), and will also enable
some other build features like building internal unit
tests.
@ -211,7 +259,7 @@ The other types of builds are:
Target Architecture
-------------------
Unless using the :ref:`fat runtime<fat_runtime>`, by default Hyperscan will be
Unless using the :ref:`fat runtime<fat_runtime>`, by default Vectorscan will be
compiled to target the instruction set of the processor of the machine that
being used for compilation. This is done via the use of ``-march=native``. The
result of this means that a library built on one machine may not work on a
@ -223,7 +271,7 @@ CMake, or ``CMAKE_C_FLAGS`` and ``CMAKE_CXX_FLAGS`` on the CMake command line. F
example, to set the instruction subsets up to ``SSE4.2`` using GCC 4.8: ::
cmake -DCMAKE_C_FLAGS="-march=corei7" \
-DCMAKE_CXX_FLAGS="-march=corei7" <hyperscan-source-path>
-DCMAKE_CXX_FLAGS="-march=corei7" <vectorscan-source-path>
For more information, refer to :ref:`instr_specialization`.
@ -232,17 +280,17 @@ For more information, refer to :ref:`instr_specialization`.
Fat Runtime
-----------
A feature introduced in Hyperscan v4.4 is the ability for the Hyperscan
A feature introduced in Hyperscan v4.4 is the ability for the Vectorscan
library to dispatch the most appropriate runtime code for the host processor.
This feature is called the "fat runtime", as a single Hyperscan library
This feature is called the "fat runtime", as a single Vectorscan library
contains multiple copies of the runtime code for different instruction sets.
.. note::
The fat runtime feature is only available on Linux. Release builds of
Hyperscan will default to having the fat runtime enabled where supported.
Vectorscan will default to having the fat runtime enabled where supported.
When building the library with the fat runtime, the Hyperscan runtime code
When building the library with the fat runtime, the Vectorscan runtime code
will be compiled multiple times for these different instruction sets, and
these compiled objects are combined into one library. There are no changes to
how user applications are built against this library.
@ -254,26 +302,28 @@ resolved so that the right version of each API function is used. There is no
impact on function call performance, as this check and resolution is performed
by the ELF loader once when the binary is loaded.
If the Hyperscan library is used on x86 systems without ``SSSE3``, the runtime
If the Vectorscan library is used on x86 systems without ``SSSE4.2``, the runtime
API functions will resolve to functions that return :c:member:`HS_ARCH_ERROR`
instead of potentially executing illegal instructions. The API function
:c:func:`hs_valid_platform` can be used by application writers to determine if
the current platform is supported by Hyperscan.
the current platform is supported by Vectorscan.
As of this release, the variants of the runtime that are built, and the CPU
capability that is required, are the following:
+----------+-------------------------------+---------------------------+
| Variant | CPU Feature Flag(s) Required | gcc arch flag |
+==========+===============================+===========================+
| Core 2 | ``SSSE3`` | ``-march=core2`` |
+----------+-------------------------------+---------------------------+
| Core i7 | ``SSE4_2`` and ``POPCNT`` | ``-march=corei7`` |
+----------+-------------------------------+---------------------------+
| AVX 2 | ``AVX2`` | ``-march=core-avx2`` |
+----------+-------------------------------+---------------------------+
| AVX 512 | ``AVX512BW`` (see note below) | ``-march=skylake-avx512`` |
+----------+-------------------------------+---------------------------+
+--------------+---------------------------------+---------------------------+
| Variant | CPU Feature Flag(s) Required | gcc arch flag |
+==============+=================================+===========================+
| Core 2 | ``SSSE3`` | ``-march=core2`` |
+--------------+---------------------------------+---------------------------+
| Core i7 | ``SSE4_2`` and ``POPCNT`` | ``-march=corei7`` |
+--------------+---------------------------------+---------------------------+
| AVX 2 | ``AVX2`` | ``-march=core-avx2`` |
+--------------+---------------------------------+---------------------------+
| AVX 512 | ``AVX512BW`` (see note below) | ``-march=skylake-avx512`` |
+--------------+---------------------------------+---------------------------+
| AVX 512 VBMI | ``AVX512VBMI`` (see note below) | ``-march=icelake-server`` |
+--------------+---------------------------------+---------------------------+
.. note::
@ -287,6 +337,21 @@ capability that is required, are the following:
cmake -DBUILD_AVX512=on <...>
Hyperscan v5.3 adds support for AVX512VBMI instructions - in particular the
``AVX512VBMI`` instruction set that was introduced on Intel "Icelake" Xeon
processors - however the AVX512VBMI runtime variant is **not** enabled by
default in fat runtime builds as not all toolchains support AVX512VBMI
instruction sets. To build an AVX512VBMI runtime, the CMake variable
``BUILD_AVX512VBMI`` must be enabled manually during configuration. For
example: ::
cmake -DBUILD_AVX512VBMI=on <...>
Vectorscan add support for Arm processors and SVE, SV2 and SVE2_BITPERM.
example: ::
cmake -DBUILD_SVE=ON -DBUILD_SVE2=ON -DBUILD_SVE2_BITPERM=ON <...>
As the fat runtime requires compiler, libc, and binutils support, at this time
it will only be enabled for Linux builds where the compiler supports the
`indirect function "ifunc" function attribute

View File

@ -1,5 +1,5 @@
###############################################
Hyperscan |version| Developer's Reference Guide
Vectorscan |version| Developer's Reference Guide
###############################################
-------

View File

@ -5,11 +5,11 @@
Introduction
############
Hyperscan is a software regular expression matching engine designed with
Vectorscan is a software regular expression matching engine designed with
high performance and flexibility in mind. It is implemented as a library that
exposes a straightforward C API.
The Hyperscan API itself is composed of two major components:
The Vectorscan API itself is composed of two major components:
***********
Compilation
@ -17,7 +17,7 @@ Compilation
These functions take a group of regular expressions, along with identifiers and
option flags, and compile them into an immutable database that can be used by
the Hyperscan scanning API. This compilation process performs considerable
the Vectorscan scanning API. This compilation process performs considerable
analysis and optimization work in order to build a database that will match the
given expressions efficiently.
@ -36,8 +36,8 @@ See :ref:`compilation` for more detail.
Scanning
********
Once a Hyperscan database has been created, it can be used to scan data in
memory. Hyperscan provides several scanning modes, depending on whether the
Once a Vectorscan database has been created, it can be used to scan data in
memory. Vectorscan provides several scanning modes, depending on whether the
data to be scanned is available as a single contiguous block, whether it is
distributed amongst several blocks in memory at the same time, or whether it is
to be scanned as a sequence of blocks in a stream.
@ -45,7 +45,7 @@ to be scanned as a sequence of blocks in a stream.
Matches are delivered to the application via a user-supplied callback function
that is called synchronously for each match.
For a given database, Hyperscan provides several guarantees:
For a given database, Vectorscan provides several guarantees:
* No memory allocations occur at runtime with the exception of two
fixed-size allocations, both of which should be done ahead of time for
@ -56,7 +56,7 @@ For a given database, Hyperscan provides several guarantees:
call.
- **Stream state**: in streaming mode only, some state space is required to
store data that persists between scan calls for each stream. This allows
Hyperscan to track matches that span multiple blocks of data.
Vectorscan to track matches that span multiple blocks of data.
* The sizes of the scratch space and stream state (in streaming mode) required
for a given database are fixed and determined at database compile time. This
@ -64,7 +64,7 @@ For a given database, Hyperscan provides several guarantees:
time, and these structures can be pre-allocated if required for performance
reasons.
* Any pattern that has successfully been compiled by the Hyperscan compiler can
* Any pattern that has successfully been compiled by the Vectorscan compiler can
be scanned against any input. There are no internal resource limits or other
limitations at runtime that could cause a scan call to return an error.
@ -74,12 +74,12 @@ See :ref:`runtime` for more detail.
Tools
*****
Some utilities for testing and benchmarking Hyperscan are included with the
Some utilities for testing and benchmarking Vectorscan are included with the
library. See :ref:`tools` for more information.
************
Example Code
************
Some simple example code demonstrating the use of the Hyperscan API is
available in the ``examples/`` subdirectory of the Hyperscan distribution.
Some simple example code demonstrating the use of the Vectorscan API is
available in the ``examples/`` subdirectory of the Vectorscan distribution.

View File

@ -4,7 +4,7 @@
Performance Considerations
##########################
Hyperscan supports a wide range of patterns in all three scanning modes. It is
Vectorscan supports a wide range of patterns in all three scanning modes. It is
capable of extremely high levels of performance, but certain patterns can
reduce performance markedly.
@ -25,7 +25,7 @@ For example, caseless matching of :regexp:`/abc/` can be written as:
* :regexp:`/(?i)abc(?-i)/`
* :regexp:`/abc/i`
Hyperscan is capable of handling all these constructs. Unless there is a
Vectorscan is capable of handling all these constructs. Unless there is a
specific reason otherwise, do not rewrite patterns from one form to another.
As another example, matching of :regexp:`/foo(bar|baz)(frotz)?/` can be
@ -41,24 +41,24 @@ Library usage
.. tip:: Do not hand-optimize library usage.
The Hyperscan library is capable of dealing with small writes, unusually large
The Vectorscan library is capable of dealing with small writes, unusually large
and small pattern sets, etc. Unless there is a specific performance problem
with some usage of the library, it is best to use Hyperscan in a simple and
with some usage of the library, it is best to use Vectorscan in a simple and
direct fashion. For example, it is unlikely for there to be much benefit in
buffering input to the library into larger blocks unless streaming writes are
tiny (say, 1-2 bytes at a time).
Unlike many other pattern matching products, Hyperscan will run faster with
Unlike many other pattern matching products, Vectorscan will run faster with
small numbers of patterns and slower with large numbers of patterns in a smooth
fashion (as opposed to, typically, running at a moderate speed up to some fixed
limit then either breaking or running half as fast).
Hyperscan also provides high-throughput matching with a single thread of
control per core; if a database runs at 3.0 Gbps in Hyperscan it means that a
Vectorscan also provides high-throughput matching with a single thread of
control per core; if a database runs at 3.0 Gbps in Vectorscan it means that a
3000-bit block of data will be scanned in 1 microsecond in a single thread of
control, not that it is required to scan 22 3000-bit blocks of data in 22
microseconds. Thus, it is not usually necessary to buffer data to supply
Hyperscan with available parallelism.
Vectorscan with available parallelism.
********************
Block-based matching
@ -72,7 +72,7 @@ accumulated before processing, it should be scanned in block rather than in
streaming mode.
Unnecessary use of streaming mode reduces the number of optimizations that can
be applied in Hyperscan and may make some patterns run slower.
be applied in Vectorscan and may make some patterns run slower.
If there is a mixture of 'block' and 'streaming' mode patterns, these should be
scanned in separate databases except in the case that the streaming patterns
@ -107,7 +107,7 @@ Allocate scratch ahead of time
Scratch allocation is not necessarily a cheap operation. Since it is the first
time (after compilation or deserialization) that a pattern database is used,
Hyperscan performs some validation checks inside :c:func:`hs_alloc_scratch` and
Vectorscan performs some validation checks inside :c:func:`hs_alloc_scratch` and
must also allocate memory.
Therefore, it is important to ensure that :c:func:`hs_alloc_scratch` is not
@ -329,7 +329,7 @@ Consequently, :regexp:`/foo.*bar/L` with a check on start of match values after
the callback is considerably more expensive and general than
:regexp:`/foo.{300}bar/`.
Similarly, the :c:member:`hs_expr_ext::min_length` extended parameter can be
Similarly, the :cpp:member:`hs_expr_ext::min_length` extended parameter can be
used to specify a lower bound on the length of the matches for a pattern. Using
this facility may be more lightweight in some circumstances than using the SOM
flag and post-confirming match length in the calling application.

View File

@ -6,35 +6,35 @@ Preface
Overview
********
Hyperscan is a regular expression engine designed to offer high performance, the
Vectorscan is a regular expression engine designed to offer high performance, the
ability to match multiple expressions simultaneously and flexibility in
scanning operation.
Patterns are provided to a compilation interface which generates an immutable
pattern database. The scan interface then can be used to scan a target data
buffer for the given patterns, returning any matching results from that data
buffer. Hyperscan also provides a streaming mode, in which matches that span
buffer. Vectorscan also provides a streaming mode, in which matches that span
several blocks in a stream are detected.
This document is designed to facilitate code-level integration of the Hyperscan
This document is designed to facilitate code-level integration of the Vectorscan
library with existing or new applications.
:ref:`intro` is a short overview of the Hyperscan library, with more detail on
the Hyperscan API provided in the subsequent sections: :ref:`compilation` and
:ref:`intro` is a short overview of the Vectorscan library, with more detail on
the Vectorscan API provided in the subsequent sections: :ref:`compilation` and
:ref:`runtime`.
:ref:`perf` provides details on various factors which may impact the
performance of a Hyperscan integration.
performance of a Vectorscan integration.
:ref:`api_constants` and :ref:`api_files` provides a detailed summary of the
Hyperscan Application Programming Interface (API).
Vectorscan Application Programming Interface (API).
********
Audience
********
This guide is aimed at developers interested in integrating Hyperscan into an
application. For information on building the Hyperscan library, see the Quick
This guide is aimed at developers interested in integrating Vectorscan into an
application. For information on building the Vectorscan library, see the Quick
Start Guide.
***********

View File

@ -4,7 +4,7 @@
Scanning for Patterns
#####################
Hyperscan provides three different scanning modes, each with its own scan
Vectorscan provides three different scanning modes, each with its own scan
function beginning with ``hs_scan``. In addition, streaming mode has a number
of other API functions for managing stream state.
@ -33,8 +33,8 @@ See :c:type:`match_event_handler` for more information.
Streaming Mode
**************
The core of the Hyperscan streaming runtime API consists of functions to open,
scan, and close Hyperscan data streams:
The core of the Vectorscan streaming runtime API consists of functions to open,
scan, and close Vectorscan data streams:
* :c:func:`hs_open_stream`: allocates and initializes a new stream for scanning.
@ -57,14 +57,14 @@ will return immediately with :c:member:`HS_SCAN_TERMINATED`. The caller must
still call :c:func:`hs_close_stream` to complete the clean-up process for that
stream.
Streams exist in the Hyperscan library so that pattern matching state can be
Streams exist in the Vectorscan library so that pattern matching state can be
maintained across multiple blocks of target data -- without maintaining this
state, it would not be possible to detect patterns that span these blocks of
data. This, however, does come at the cost of requiring an amount of storage
per-stream (the size of this storage is fixed at compile time), and a slight
performance penalty in some cases to manage the state.
While Hyperscan does always support a strict ordering of multiple matches,
While Vectorscan does always support a strict ordering of multiple matches,
streaming matches will not be delivered at offsets before the current stream
write, with the exception of zero-width asserts, where constructs such as
:regexp:`\\b` and :regexp:`$` can cause a match on the final character of a
@ -76,7 +76,7 @@ Stream Management
=================
In addition to :c:func:`hs_open_stream`, :c:func:`hs_scan_stream`, and
:c:func:`hs_close_stream`, the Hyperscan API provides a number of other
:c:func:`hs_close_stream`, the Vectorscan API provides a number of other
functions for the management of streams:
* :c:func:`hs_reset_stream`: resets a stream to its initial state; this is
@ -98,10 +98,10 @@ A stream object is allocated as a fixed size region of memory which has been
sized to ensure that no memory allocations are required during scan
operations. When the system is under memory pressure, it may be useful to reduce
the memory consumed by streams that are not expected to be used soon. The
Hyperscan API provides calls for translating a stream to and from a compressed
Vectorscan API provides calls for translating a stream to and from a compressed
representation for this purpose. The compressed representation differs from the
full stream object as it does not reserve space for components which are not
required given the current stream state. The Hyperscan API functions for this
required given the current stream state. The Vectorscan API functions for this
functionality are:
* :c:func:`hs_compress_stream`: fills the provided buffer with a compressed
@ -157,7 +157,7 @@ scanned in block mode.
Scratch Space
*************
While scanning data, Hyperscan needs a small amount of temporary memory to store
While scanning data, Vectorscan needs a small amount of temporary memory to store
on-the-fly internal data. This amount is unfortunately too large to fit on the
stack, particularly for embedded applications, and allocating memory dynamically
is too expensive, so a pre-allocated "scratch" space must be provided to the
@ -170,7 +170,7 @@ databases, only a single scratch region is necessary: in this case, calling
will ensure that the scratch space is large enough to support scanning against
any of the given databases.
While the Hyperscan library is re-entrant, the use of scratch spaces is not.
While the Vectorscan library is re-entrant, the use of scratch spaces is not.
For example, if by design it is deemed necessary to run recursive or nested
scanning (say, from the match callback function), then an additional scratch
space is required for that context.
@ -178,7 +178,7 @@ space is required for that context.
In the absence of recursive scanning, only one such space is required per thread
and can (and indeed should) be allocated before data scanning is to commence.
In a scenario where a set of expressions are compiled by a single "master"
In a scenario where a set of expressions are compiled by a single "main"
thread and data will be scanned by multiple "worker" threads, the convenience
function :c:func:`hs_clone_scratch` allows multiple copies of an existing
scratch space to be made for each thread (rather than forcing the caller to pass
@ -219,11 +219,11 @@ For example:
Custom Allocators
*****************
By default, structures used by Hyperscan at runtime (scratch space, stream
By default, structures used by Vectorscan at runtime (scratch space, stream
state, etc) are allocated with the default system allocators, usually
``malloc()`` and ``free()``.
The Hyperscan API provides a facility for changing this behaviour to support
The Vectorscan API provides a facility for changing this behaviour to support
applications that use custom memory allocators.
These functions are:

View File

@ -4,7 +4,7 @@
Serialization
#############
For some applications, compiling Hyperscan pattern databases immediately prior
For some applications, compiling Vectorscan pattern databases immediately prior
to use is not an appropriate design. Some users may wish to:
* Compile pattern databases on a different host;
@ -14,9 +14,9 @@ to use is not an appropriate design. Some users may wish to:
* Control the region of memory in which the compiled database is located.
Hyperscan pattern databases are not completely flat in memory: they contain
Vectorscan pattern databases are not completely flat in memory: they contain
pointers and have specific alignment requirements. Therefore, they cannot be
copied (or otherwise relocated) directly. To enable these use cases, Hyperscan
copied (or otherwise relocated) directly. To enable these use cases, Vectorscan
provides functionality for serializing and deserializing compiled pattern
databases.
@ -40,10 +40,10 @@ The API provides the following functions:
returns a string containing information about the database. This call is
analogous to :c:func:`hs_database_info`.
.. note:: Hyperscan performs both version and platform compatibility checks
.. note:: Vectorscan performs both version and platform compatibility checks
upon deserialization. The :c:func:`hs_deserialize_database` and
:c:func:`hs_deserialize_database_at` functions will only permit the
deserialization of databases compiled with (a) the same version of Hyperscan
deserialization of databases compiled with (a) the same version of Vectorscan
and (b) platform features supported by the current host platform. See
:ref:`instr_specialization` for more information on platform specialization.
@ -51,17 +51,17 @@ The API provides the following functions:
The Runtime Library
===================
The main Hyperscan library (``libhs``) contains both the compiler and runtime
portions of the library. This means that in order to support the Hyperscan
The main Vectorscan library (``libhs``) contains both the compiler and runtime
portions of the library. This means that in order to support the Vectorscan
compiler, which is written in C++, it requires C++ linkage and has a
dependency on the C++ standard library.
Many embedded applications require only the scanning ("runtime") portion of the
Hyperscan library. In these cases, pattern compilation generally takes place on
Vectorscan library. In these cases, pattern compilation generally takes place on
another host, and serialized pattern databases are delivered to the application
for use.
To support these applications without requiring the C++ dependency, a
runtime-only version of the Hyperscan library, called ``libhs_runtime``, is also
runtime-only version of the Vectorscan library, called ``libhs_runtime``, is also
distributed. This library does not depend on the C++ standard library and
provides all Hyperscan functions other that those used to compile databases.
provides all Vectorscan functions other that those used to compile databases.

View File

@ -4,14 +4,14 @@
Tools
#####
This section describes the set of utilities included with the Hyperscan library.
This section describes the set of utilities included with the Vectorscan library.
********************
Quick Check: hscheck
********************
The ``hscheck`` tool allows the user to quickly check whether Hyperscan supports
a group of patterns. If a pattern is rejected by Hyperscan's compiler, the
The ``hscheck`` tool allows the user to quickly check whether Vectorscan supports
a group of patterns. If a pattern is rejected by Vectorscan's compiler, the
compile error is provided on standard output.
For example, given the following three patterns (the last of which contains a
@ -34,7 +34,7 @@ syntax error) in a file called ``/tmp/test``::
Benchmarker: hsbench
********************
The ``hsbench`` tool provides an easy way to measure Hyperscan's performance
The ``hsbench`` tool provides an easy way to measure Vectorscan's performance
for a particular set of patterns and corpus of data to be scanned.
Patterns are supplied in the format described below in
@ -44,7 +44,7 @@ easy control of how a corpus is broken into blocks and streams.
.. note:: A group of Python scripts for constructing corpora databases from
various input types, such as PCAP network traffic captures or text files, can
be found in the Hyperscan source tree in ``tools/hsbench/scripts``.
be found in the Vectorscan source tree in ``tools/hsbench/scripts``.
Running hsbench
===============
@ -56,7 +56,7 @@ produce output like this::
$ hsbench -e /tmp/patterns -c /tmp/corpus.db
Signatures: /tmp/patterns
Hyperscan info: Version: 4.3.1 Features: AVX2 Mode: STREAM
Vectorscan info: Version: 5.4.11 Features: AVX2 Mode: STREAM
Expression count: 200
Bytecode size: 342,540 bytes
Database CRC: 0x6cd6b67c
@ -77,7 +77,7 @@ takes to perform all twenty scans. The number of repeats can be changed with the
``-n`` argument, and the results of each scan will be displayed if the
``--per-scan`` argument is specified.
To benchmark Hyperscan on more than one core, you can supply a list of cores
To benchmark Vectorscan on more than one core, you can supply a list of cores
with the ``-T`` argument, which will instruct ``hsbench`` to start one
benchmark thread per core given and compute the throughput from the time taken
to complete all of them.
@ -91,17 +91,17 @@ Correctness Testing: hscollider
*******************************
The ``hscollider`` tool, or Pattern Collider, provides a way to verify
Hyperscan's matching behaviour. It does this by compiling and scanning patterns
Vectorscan's matching behaviour. It does this by compiling and scanning patterns
(either singly or in groups) against known corpora and comparing the results
against another engine (the "ground truth"). Two sources of ground truth for
comparison are available:
* The PCRE library (http://pcre.org/).
* An NFA simulation run on Hyperscan's compile-time graph representation. This
* An NFA simulation run on Vectorscan's compile-time graph representation. This
is used if PCRE cannot support the pattern or if PCRE execution fails due to
a resource limit.
Much of Hyperscan's testing infrastructure is built on ``hscollider``, and the
Much of Vectorscan's testing infrastructure is built on ``hscollider``, and the
tool is designed to take advantage of multiple cores and provide considerable
flexibility in controlling the test. These options are described in the help
(``hscollider -h``) and include:
@ -116,11 +116,11 @@ flexibility in controlling the test. These options are described in the help
Using hscollider to debug a pattern
===================================
One common use-case for ``hscollider`` is to determine whether Hyperscan will
One common use-case for ``hscollider`` is to determine whether Vectorscan will
match a pattern in the expected location, and whether this accords with PCRE's
behaviour for the same case.
Here is an example. We put our pattern in a file in Hyperscan's pattern
Here is an example. We put our pattern in a file in Vectorscan's pattern
format::
$ cat /tmp/pat
@ -172,7 +172,7 @@ individual matches are displayed in the output::
Total elapsed time: 0.00522815 secs.
We can see from this output that both PCRE and Hyperscan find matches ending at
We can see from this output that both PCRE and Vectorscan find matches ending at
offset 33 and 45, and so ``hscollider`` considers this test case to have
passed.
@ -180,13 +180,13 @@ passed.
corpus alignment 0, and ``-T 1`` instructs us to only use one thread.)
.. note:: In default operation, PCRE produces only one match for a scan, unlike
Hyperscan's automata semantics. The ``hscollider`` tool uses libpcre's
"callout" functionality to match Hyperscan's semantics.
Vectorscan's automata semantics. The ``hscollider`` tool uses libpcre's
"callout" functionality to match Vectorscan's semantics.
Running a larger scan test
==========================
A set of patterns for testing purposes are distributed with Hyperscan, and these
A set of patterns for testing purposes are distributed with Vectorscan, and these
can be tested via ``hscollider`` on an in-tree build. Two CMake targets are
provided to do this easily:
@ -202,10 +202,10 @@ Debugging: hsdump
*****************
When built in debug mode (using the CMake directive ``CMAKE_BUILD_TYPE`` set to
``Debug``), Hyperscan includes support for dumping information about its
``Debug``), Vectorscan includes support for dumping information about its
internals during pattern compilation with the ``hsdump`` tool.
This information is mostly of use to Hyperscan developers familiar with the
This information is mostly of use to Vectorscan developers familiar with the
library's internal structure, but can be used to diagnose issues with patterns
and provide more information in bug reports.
@ -215,7 +215,7 @@ and provide more information in bug reports.
Pattern Format
**************
All of the Hyperscan tools accept patterns in the same format, read from plain
All of the Vectorscan tools accept patterns in the same format, read from plain
text files with one pattern per line. Each line looks like this:
* ``<integer id>:/<regex>/<flags>``
@ -227,12 +227,12 @@ For example::
3:/^.{10,20}hatstand/m
The integer ID is the value that will be reported when a match is found by
Hyperscan and must be unique.
Vectorscan and must be unique.
The pattern itself is a regular expression in PCRE syntax; see
:ref:`compilation` for more information on supported features.
The flags are single characters that map to Hyperscan flags as follows:
The flags are single characters that map to Vectorscan flags as follows:
========= ================================= ===========
Character API Flag Description
@ -256,7 +256,7 @@ between braces, separated by commas. For example::
1:/hatstand.*teakettle/s{min_offset=50,max_offset=100}
All Hyperscan tools will accept a pattern file (or a directory containing
All Vectorscan tools will accept a pattern file (or a directory containing
pattern files) with the ``-e`` argument. If no further arguments constraining
the pattern set are given, all patterns in those files are used.

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -112,6 +113,7 @@
*
*/
#include <random>
#include <algorithm>
#include <cstring>
#include <chrono>
@ -133,7 +135,12 @@
#include <netinet/tcp.h>
#include <netinet/udp.h>
#include <netinet/ip_icmp.h>
#ifdef __NetBSD__
#include <net/ethertypes.h>
#include <net/if_ether.h>
#else
#include <net/ethernet.h>
#endif /* __NetBSD__ */
#include <arpa/inet.h>
#include <pcap.h>
@ -151,6 +158,8 @@ using std::set;
using std::min;
using std::max;
using std::copy;
using std::random_device;
using std::mt19937;
enum Criterion {
CRITERION_THROUGHPUT,
@ -193,15 +202,15 @@ struct FiveTuple {
unsigned int dstPort;
// Construct a FiveTuple from a TCP or UDP packet.
FiveTuple(const struct ip *iphdr) {
explicit FiveTuple(const struct ip *iphdr) {
// IP fields
protocol = iphdr->ip_p;
srcAddr = iphdr->ip_src.s_addr;
dstAddr = iphdr->ip_dst.s_addr;
// UDP/TCP ports
const struct udphdr *uh = (const struct udphdr *)
(((const char *)iphdr) + (iphdr->ip_hl * 4));
const struct udphdr *uh = reinterpret_cast<const struct udphdr *>
((reinterpret_cast<const char *>(iphdr)) + (iphdr->ip_hl * 4));
srcPort = uh->uh_sport;
dstPort = uh->uh_dport;
}
@ -230,7 +239,7 @@ static
int onMatch(unsigned int id, unsigned long long from, unsigned long long to,
unsigned int flags, void *ctx) {
// Our context points to a size_t storing the match count
size_t *matches = (size_t *)ctx;
size_t *matches = static_cast<size_t *>(ctx);
(*matches)++;
return 0; // continue matching
}
@ -292,7 +301,7 @@ public:
// database.
hs_error_t err = hs_alloc_scratch(db, &scratch);
if (err != HS_SUCCESS) {
cerr << "ERROR: could not allocate scratch space. Exiting." << endl;
cerr << "ERROR: could not allocate scratch space. Exiting.\n";
exit(-1);
}
}
@ -304,8 +313,7 @@ public:
size_t scratch_size;
hs_error_t err = hs_scratch_size(scratch, &scratch_size);
if (err != HS_SUCCESS) {
cerr << "ERROR: could not query scratch space size. Exiting."
<< endl;
cerr << "ERROR: could not query scratch space size. Exiting.\n";
exit(-1);
}
return scratch_size;
@ -331,9 +339,9 @@ public:
}
// Valid TCP or UDP packet
const struct ip *iphdr = (const struct ip *)(pktData
const struct ip *iphdr = reinterpret_cast<const struct ip *>(pktData
+ sizeof(struct ether_header));
const char *payload = (const char *)pktData + offset;
const char *payload = reinterpret_cast<const char *>(pktData) + offset;
size_t id = stream_map.insert(std::make_pair(FiveTuple(iphdr),
stream_map.size())).first->second;
@ -349,9 +357,8 @@ public:
// Return the number of bytes scanned
size_t bytes() const {
size_t sum = 0;
for (const auto &packet : packets) {
sum += packet.size();
}
auto packs = [](size_t z, const string &packet) { return z + packet.size(); };
sum += std::accumulate(packets.begin(), packets.end(), 0, packs);
return sum;
}
@ -371,7 +378,7 @@ public:
for (auto &stream : streams) {
hs_error_t err = hs_open_stream(db, 0, &stream);
if (err != HS_SUCCESS) {
cerr << "ERROR: Unable to open stream. Exiting." << endl;
cerr << "ERROR: Unable to open stream. Exiting.\n";
exit(-1);
}
}
@ -380,11 +387,11 @@ public:
// Close all open Hyperscan streams (potentially generating any
// end-anchored matches)
void closeStreams() {
for (auto &stream : streams) {
for (const auto &stream : streams) {
hs_error_t err =
hs_close_stream(stream, scratch, onMatch, &matchCount);
if (err != HS_SUCCESS) {
cerr << "ERROR: Unable to close stream. Exiting." << endl;
cerr << "ERROR: Unable to close stream. Exiting.\n";
exit(-1);
}
}
@ -399,7 +406,7 @@ public:
pkt.c_str(), pkt.length(), 0,
scratch, onMatch, &matchCount);
if (err != HS_SUCCESS) {
cerr << "ERROR: Unable to scan packet. Exiting." << endl;
cerr << "ERROR: Unable to scan packet. Exiting.\n";
exit(-1);
}
}
@ -413,7 +420,7 @@ public:
hs_error_t err = hs_scan(db, pkt.c_str(), pkt.length(), 0,
scratch, onMatch, &matchCount);
if (err != HS_SUCCESS) {
cerr << "ERROR: Unable to scan packet. Exiting." << endl;
cerr << "ERROR: Unable to scan packet. Exiting.\n";
exit(-1);
}
}
@ -433,7 +440,7 @@ class Sigdata {
public:
Sigdata() {}
Sigdata(const char *filename) {
explicit Sigdata(const char *filename) {
parseFile(filename, patterns, flags, ids, originals);
}
@ -451,9 +458,8 @@ public:
// dynamic storage.)
vector<const char *> cstrPatterns;
cstrPatterns.reserve(patterns.size());
for (const auto &pattern : patterns) {
cstrPatterns.push_back(pattern.c_str());
}
auto pstr = [](const string &pattern) { return pattern.c_str(); };
std::transform(patterns.begin(), patterns.end(), std::back_inserter(cstrPatterns), pstr);
Clock clock;
clock.start();
@ -502,29 +508,29 @@ public:
static
void usage(const char *) {
cerr << "Usage:" << endl << endl;
cerr << " patbench [-n repeats] [ -G generations] [ -C criterion ]" << endl
cerr << "Usage:\n\n";
cerr << " patbench [-n repeats] [ -G generations] [ -C criterion ]\n"
<< " [ -F factor_group_size ] [ -N | -S ] "
<< "<pattern file> <pcap file>" << endl << endl
<< "<pattern file> <pcap file>\n\n"
<< " -n repeats sets the number of times the PCAP is repeatedly "
"scanned" << endl << " with the pattern." << endl
"scanned\n" << " with the pattern.\n"
<< " -G generations sets the number of generations that the "
"algorithm is" << endl << " run for." << endl
"algorithm is\n" << " run for.\n"
<< " -N sets non-streaming mode, -S sets streaming mode (default)."
<< endl << " -F sets the factor group size (must be >0); this "
"allows the detection" << endl
<< " of multiple interacting factors." << endl << "" << endl
<< " -C sets the 'criterion', which can be either:" << endl
"allows the detection\n"
<< " of multiple interacting factors.\n" << "\n"
<< " -C sets the 'criterion', which can be either:\n"
<< " t throughput (the default) - this requires a pcap file"
<< endl << " r scratch size" << endl
<< " s stream state size" << endl
<< " c compile time" << endl << " b bytecode size"
<< endl << " r scratch size\n"
<< " s stream state size\n"
<< " c compile time\n" << " b bytecode size"
<< endl << endl
<< "We recommend the use of a utility like 'taskset' on "
"multiprocessor hosts to" << endl
"multiprocessor hosts to\n"
<< "lock execution to a single processor: this will remove processor "
"migration" << endl
<< "by the scheduler as a source of noise in the results." << endl;
"migration\n"
<< "by the scheduler as a source of noise in the results.\n";
}
static
@ -556,7 +562,7 @@ double measure_block_time(Benchmark &bench, unsigned int repeatCount) {
}
static
double eval_set(Benchmark &bench, Sigdata &sigs, unsigned int mode,
double eval_set(Benchmark &bench, const Sigdata &sigs, unsigned int mode,
unsigned repeatCount, Criterion criterion,
bool diagnose = true) {
double compileTime = 0;
@ -567,7 +573,7 @@ double eval_set(Benchmark &bench, Sigdata &sigs, unsigned int mode,
size_t dbSize;
hs_error_t err = hs_database_size(bench.getDatabase(), &dbSize);
if (err != HS_SUCCESS) {
cerr << "ERROR: could not retrieve bytecode size" << endl;
cerr << "ERROR: could not retrieve bytecode size\n";
exit(1);
}
return dbSize;
@ -578,7 +584,7 @@ double eval_set(Benchmark &bench, Sigdata &sigs, unsigned int mode,
size_t streamStateSize;
hs_error_t err = hs_stream_size(bench.getDatabase(), &streamStateSize);
if (err != HS_SUCCESS) {
cerr << "ERROR: could not retrieve stream state size" << endl;
cerr << "ERROR: could not retrieve stream state size\n";
exit(1);
}
return streamStateSize;
@ -596,8 +602,9 @@ double eval_set(Benchmark &bench, Sigdata &sigs, unsigned int mode,
scan_time = measure_stream_time(bench, repeatCount);
}
size_t bytes = bench.bytes();
size_t matches = bench.matches();
if (diagnose) {
size_t matches = bench.matches();
std::ios::fmtflags f(cout.flags());
cout << "Scan time " << std::fixed << std::setprecision(3) << scan_time
<< " sec, Scanned " << bytes * repeatCount << " bytes, Throughput "
@ -676,14 +683,13 @@ int main(int argc, char **argv) {
Benchmark bench;
if (criterion == CRITERION_THROUGHPUT) {
if (!bench.readStreams(pcapFile)) {
cerr << "Unable to read packets from PCAP file. Exiting." << endl;
cerr << "Unable to read packets from PCAP file. Exiting.\n";
exit(-1);
}
}
if ((criterion == CRITERION_STREAM_STATE) && (mode != HS_MODE_STREAM)) {
cerr << "Cannot evaluate stream state for block mode compile. Exiting."
<< endl;
cerr << "Cannot evaluate stream state for block mode compile. Exiting.\n";
exit(-1);
}
@ -721,7 +727,7 @@ int main(int argc, char **argv) {
unsigned generations = min(gen_max, (sigs.size() - 1) / factor_max);
cout << "Cutting signatures cumulatively for " << generations
<< " generations" << endl;
<< " generations\n";
for (unsigned gen = 0; gen < generations; ++gen) {
cout << "Generation " << gen << " ";
set<unsigned> s(work_sigs.begin(), work_sigs.end());
@ -731,7 +737,9 @@ int main(int argc, char **argv) {
count++;
cout << "." << std::flush;
vector<unsigned> sv(s.begin(), s.end());
random_shuffle(sv.begin(), sv.end());
random_device rng;
mt19937 urng(rng());
shuffle(sv.begin(), sv.end(), urng);
unsigned groups = factor_max + 1;
for (unsigned current_group = 0; current_group < groups;
current_group++) {
@ -763,7 +771,7 @@ int main(int argc, char **argv) {
cout << "Performance: ";
print_criterion(criterion, best);
cout << " (" << std::fixed << std::setprecision(3) << (best / score_base)
<< "x) after cutting:" << endl;
<< "x) after cutting:\n";
cout.flags(out_f);
// s now has factor_max signatures
@ -786,7 +794,7 @@ int main(int argc, char **argv) {
static
bool payloadOffset(const unsigned char *pkt_data, unsigned int *offset,
unsigned int *length) {
const ip *iph = (const ip *)(pkt_data + sizeof(ether_header));
const ip *iph = reinterpret_cast<const ip *>(pkt_data + sizeof(ether_header));
const tcphdr *th = nullptr;
// Ignore packets that aren't IPv4
@ -805,7 +813,7 @@ bool payloadOffset(const unsigned char *pkt_data, unsigned int *offset,
switch (iph->ip_p) {
case IPPROTO_TCP:
th = (const tcphdr *)((const char *)iph + ihlen);
th = reinterpret_cast<const tcphdr *>(reinterpret_cast<const char *>(iph) + ihlen);
thlen = th->th_off * 4;
break;
case IPPROTO_UDP:
@ -842,7 +850,7 @@ static unsigned parseFlags(const string &flagsStr) {
case '\r': // stray carriage-return
break;
default:
cerr << "Unsupported flag \'" << c << "\'" << endl;
cerr << "Unsupported flag \'" << c << "\'\n";
exit(-1);
}
}
@ -854,7 +862,7 @@ static void parseFile(const char *filename, vector<string> &patterns,
vector<string> &originals) {
ifstream inFile(filename);
if (!inFile.good()) {
cerr << "ERROR: Can't open pattern file \"" << filename << "\"" << endl;
cerr << "ERROR: Can't open pattern file \"" << filename << "\"\n";
exit(-1);
}
@ -884,7 +892,7 @@ static void parseFile(const char *filename, vector<string> &patterns,
size_t flagsStart = expr.find_last_of('/');
if (flagsStart == string::npos) {
cerr << "ERROR: no trailing '/' char" << endl;
cerr << "ERROR: no trailing '/' char\n";
exit(-1);
}

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2016, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -54,6 +55,7 @@
#include <fstream>
#include <iomanip>
#include <iostream>
#include <numeric>
#include <string>
#include <unordered_map>
#include <vector>
@ -68,7 +70,12 @@
#include <netinet/tcp.h>
#include <netinet/udp.h>
#include <netinet/ip_icmp.h>
#ifdef __NetBSD__
#include <net/ethertypes.h>
#include <net/if_ether.h>
#else
#include <net/ethernet.h>
#endif /* __NetBSD__ */
#include <arpa/inet.h>
#include <pcap.h>
@ -93,15 +100,15 @@ struct FiveTuple {
unsigned int dstPort;
// Construct a FiveTuple from a TCP or UDP packet.
FiveTuple(const struct ip *iphdr) {
explicit FiveTuple(const struct ip *iphdr) {
// IP fields
protocol = iphdr->ip_p;
srcAddr = iphdr->ip_src.s_addr;
dstAddr = iphdr->ip_dst.s_addr;
// UDP/TCP ports
const struct udphdr *uh =
(const struct udphdr *)(((const char *)iphdr) + (iphdr->ip_hl * 4));
const char * iphdr_base = reinterpret_cast<const char *>(iphdr);
const struct udphdr *uh = reinterpret_cast<const struct udphdr *>(iphdr_base + (iphdr->ip_hl * 4));
srcPort = uh->uh_sport;
dstPort = uh->uh_dport;
}
@ -130,7 +137,7 @@ static
int onMatch(unsigned int id, unsigned long long from, unsigned long long to,
unsigned int flags, void *ctx) {
// Our context points to a size_t storing the match count
size_t *matches = (size_t *)ctx;
size_t *matches = static_cast<size_t *>(ctx);
(*matches)++;
return 0; // continue matching
}
@ -226,9 +233,8 @@ public:
}
// Valid TCP or UDP packet
const struct ip *iphdr = (const struct ip *)(pktData
+ sizeof(struct ether_header));
const char *payload = (const char *)pktData + offset;
const struct ip *iphdr = reinterpret_cast<const struct ip *>(pktData + sizeof(struct ether_header));
const char *payload = reinterpret_cast<const char *>(pktData) + offset;
size_t id = stream_map.insert(std::make_pair(FiveTuple(iphdr),
stream_map.size())).first->second;
@ -244,9 +250,8 @@ public:
// Return the number of bytes scanned
size_t bytes() const {
size_t sum = 0;
for (const auto &packet : packets) {
sum += packet.size();
}
auto packs = [](size_t z, const string &packet) { return z + packet.size(); };
sum += std::accumulate(packets.begin(), packets.end(), 0, packs);
return sum;
}
@ -275,7 +280,7 @@ public:
// Close all open Hyperscan streams (potentially generating any
// end-anchored matches)
void closeStreams() {
for (auto &stream : streams) {
for (const auto &stream : streams) {
hs_error_t err = hs_close_stream(stream, scratch, onMatch,
&matchCount);
if (err != HS_SUCCESS) {
@ -427,7 +432,8 @@ static void databasesFromFile(const char *filename,
// storage.)
vector<const char*> cstrPatterns;
for (const auto &pattern : patterns) {
cstrPatterns.push_back(pattern.c_str());
// cppcheck-suppress useStlAlgorithm
cstrPatterns.push_back(pattern.c_str()); //NOLINT (performance-inefficient-vector-operation)
}
cout << "Compiling Hyperscan databases with " << patterns.size()
@ -568,7 +574,8 @@ int main(int argc, char **argv) {
*/
static bool payloadOffset(const unsigned char *pkt_data, unsigned int *offset,
unsigned int *length) {
const ip *iph = (const ip *)(pkt_data + sizeof(ether_header));
const ip *iph = reinterpret_cast<const ip *>(pkt_data + sizeof(ether_header));
const char *iph_base = reinterpret_cast<const char *>(iph);
const tcphdr *th = nullptr;
// Ignore packets that aren't IPv4
@ -587,7 +594,7 @@ static bool payloadOffset(const unsigned char *pkt_data, unsigned int *offset,
switch (iph->ip_p) {
case IPPROTO_TCP:
th = (const tcphdr *)((const char *)iph + ihlen);
th = reinterpret_cast<const tcphdr *>(iph_base + ihlen);
thlen = th->th_off * 4;
break;
case IPPROTO_UDP:

View File

@ -67,7 +67,7 @@
* to pass in the pattern that was being searched for so we can print it out.
*/
static int eventHandler(unsigned int id, unsigned long long from,
unsigned long long to, unsigned int flags, void *ctx) {
unsigned long long to, unsigned int flags, void *ctx) { // cppcheck-suppress constParameterCallback
printf("Match for pattern \"%s\" at offset %llu\n", (char *)ctx, to);
return 0;
}
@ -150,7 +150,7 @@ int main(int argc, char *argv[]) {
}
char *pattern = argv[1];
char *inputFN = argv[2];
const char *inputFN = argv[2];
/* First, we attempt to compile the pattern provided on the command line.
* We assume 'DOTALL' semantics, meaning that the '.' meta-character will

View File

@ -4,8 +4,7 @@ libdir=@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@
includedir=@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@
Name: libhs
Description: Intel(R) Hyperscan Library
Description: A portable fork of the high-performance regular expression matching library
Version: @HS_VERSION@
Libs: -L${libdir} -lhs
Libs.private: @PRIVATE_LIBS@
Cflags: -I${includedir}/hs

53
scripts/change_command.py Normal file
View File

@ -0,0 +1,53 @@
#
# Copyright (c) 2020-2023, VectorCamp PC
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#
import json
import sys
#reads from the clang-tidy config file the first comment to ignore specific files
# Get the paths from the command-line arguments
# python3 ../source/scripts/change_command.py ../source/.clang-tidy ./compile_commands.json
clang_tidy_config_path = sys.argv[1]
compile_commands_path = sys.argv[2]
# Load the data from the file
with open(compile_commands_path, 'r') as f:
data = json.load(f)
# Open the clang-tidy config file and read the first comment
with open(clang_tidy_config_path, 'r') as f:
for line in f:
if line.startswith('#'):
ignore_files = line[1:].strip().split(',')
break
# Filter out the entries for the ignored files
data = [entry for entry in data if not any(ignore_file in entry['file'] for ignore_file in ignore_files)]
# Write the result to the same file
with open(compile_commands_path, 'w') as f:
json.dump(data, f, indent=2)

1
simde Submodule

@ -0,0 +1 @@
Subproject commit 416091ebdb9e901b29d026633e73167d6353a0b0

View File

@ -176,7 +176,8 @@ void replaceAssertVertex(NGHolder &g, NFAVertex t, const ExpressionInfo &expr,
auto ecit = edge_cache.find(cache_key);
if (ecit == edge_cache.end()) {
DEBUG_PRINTF("adding edge %zu %zu\n", g[u].index, g[v].index);
NFAEdge e = add_edge(u, v, g);
NFAEdge e;
std::tie(e, std::ignore) = add_edge(u, v, g);
edge_cache.emplace(cache_key, e);
g[e].assert_flags = flags;
if (++assert_edge_count > MAX_ASSERT_EDGES) {
@ -229,11 +230,12 @@ void checkForMultilineStart(ReportManager &rm, NGHolder &g,
/* we need to interpose a dummy dot vertex between v and accept if
* required so that ^ doesn't match trailing \n */
for (const auto &e : out_edges_range(v, g)) {
if (target(e, g) == g.accept) {
dead.push_back(e);
}
}
auto deads = [&g=g](const NFAEdge &e) {
return (target(e, g) == g.accept);
};
const auto &er = out_edges_range(v, g);
std::copy_if(begin(er), end(er), std::back_inserter(dead), deads);
/* assert has been resolved; clear flag */
g[v].assert_flags &= ~POS_FLAG_MULTILINE_START;
}
@ -251,6 +253,7 @@ void checkForMultilineStart(ReportManager &rm, NGHolder &g,
static
bool hasAssertVertices(const NGHolder &g) {
// cppcheck-suppress useStlAlgorithm
for (auto v : vertices_range(g)) {
int flags = g[v].assert_flags;
if (flags & WORDBOUNDARY_FLAGS) {

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2020, Intel Corporation
* Copyright (c) 2015-2021, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -323,7 +323,8 @@ void addExpression(NG &ng, unsigned index, const char *expression,
}
// Ensure that our pattern isn't too long (in characters).
if (strlen(expression) > cc.grey.limitPatternLength) {
size_t maxlen = cc.grey.limitPatternLength + 1;
if (strnlen(expression, maxlen) >= maxlen) {
throw CompileError("Pattern length exceeds limit.");
}
@ -416,6 +417,10 @@ void addLitExpression(NG &ng, unsigned index, const char *expression,
"HS_FLAG_SOM_LEFTMOST are supported in literal API.");
}
if (expLength == 0) {
throw CompileError("Pure literal API doesn't support empty string.");
}
// This expression must be a pure literal, we can build ue2_literal
// directly based on expression text.
ParsedLitExpression ple(index, expression, expLength, flags, id);
@ -438,7 +443,7 @@ bytecode_ptr<RoseEngine> generateRoseEngine(NG &ng) {
if (!rose) {
DEBUG_PRINTF("error building rose\n");
assert(0);
return nullptr;
return bytecode_ptr<RoseEngine>(nullptr);
}
dumpReportManager(ng.rm, ng.cc.grey);
@ -458,6 +463,9 @@ platform_t target_to_platform(const target_t &target_info) {
if (!target_info.has_avx512()) {
p |= HS_PLATFORM_NOAVX512;
}
if (!target_info.has_avx512vbmi()) {
p |= HS_PLATFORM_NOAVX512VBMI;
}
return p;
}
@ -470,7 +478,7 @@ hs_database_t *dbCreate(const char *in_bytecode, size_t len, u64a platform) {
DEBUG_PRINTF("db size %zu\n", db_len);
DEBUG_PRINTF("db platform %llx\n", platform);
struct hs_database *db = (struct hs_database *)hs_database_alloc(db_len);
struct hs_database *db = static_cast<struct hs_database *>(hs_database_alloc(db_len));
if (hs_check_alloc(db) != HS_SUCCESS) {
hs_database_free(db);
return nullptr;
@ -484,7 +492,7 @@ hs_database_t *dbCreate(const char *in_bytecode, size_t len, u64a platform) {
DEBUG_PRINTF("shift is %zu\n", shift);
db->bytecode = offsetof(struct hs_database, bytes) - shift;
char *bytecode = (char *)db + db->bytecode;
char *bytecode = reinterpret_cast<char *>(db) + db->bytecode;
assert(ISALIGNED_CL(bytecode));
db->magic = HS_DB_MAGIC;
@ -517,7 +525,7 @@ struct hs_database *build(NG &ng, unsigned int *length, u8 pureFlag) {
throw CompileError("Internal error.");
}
const char *bytecode = (const char *)(rose.get());
const char *bytecode = reinterpret_cast<const char *>(rose.get());
const platform_t p = target_to_platform(ng.cc.target_info);
struct hs_database *db = dbCreate(bytecode, *length, p);
if (!db) {

View File

@ -57,15 +57,14 @@ extern const hs_compile_error_t hs_badalloc = {
namespace ue2 {
hs_compile_error_t *generateCompileError(const string &err, int expression) {
hs_compile_error_t *ret =
(struct hs_compile_error *)hs_misc_alloc(sizeof(hs_compile_error_t));
hs_compile_error_t *ret = static_cast<struct hs_compile_error *>(hs_misc_alloc(sizeof(hs_compile_error_t)));
if (ret) {
hs_error_t e = hs_check_alloc(ret);
if (e != HS_SUCCESS) {
hs_misc_free(ret);
return const_cast<hs_compile_error_t *>(&hs_badalloc);
}
char *msg = (char *)hs_misc_alloc(err.size() + 1);
char *msg = static_cast<char *>(hs_misc_alloc(err.size() + 1));
if (msg) {
e = hs_check_alloc(msg);
if (e != HS_SUCCESS) {

View File

@ -30,7 +30,6 @@
#include "config.h"
#include "ue2common.h"
#include "util/arch.h"
#include "util/intrinsics.h"
#if !defined(HAVE_SSE42)
@ -543,14 +542,13 @@ u32 crc32c_sb8_64_bit(u32 running_crc, const unsigned char* p_buf,
// Main aligned loop, processes eight bytes at a time.
u32 term1, term2;
for (size_t li = 0; li < running_length/8; li++) {
u32 block = *(const u32 *)p_buf;
crc ^= block;
p_buf += 4;
term1 = crc_tableil8_o88[crc & 0x000000FF] ^
u32 term1 = crc_tableil8_o88[crc & 0x000000FF] ^
crc_tableil8_o80[(crc >> 8) & 0x000000FF];
term2 = crc >> 16;
u32 term2 = crc >> 16;
crc = term1 ^
crc_tableil8_o72[term2 & 0x000000FF] ^
crc_tableil8_o64[(term2 >> 8) & 0x000000FF];
@ -579,53 +577,7 @@ u32 crc32c_sb8_64_bit(u32 running_crc, const unsigned char* p_buf,
}
#else // HAVE_SSE42
#ifdef ARCH_64_BIT
#define CRC_WORD 8
#define CRC_TYPE u64a
#define CRC_FUNC _mm_crc32_u64
#else
#define CRC_WORD 4
#define CRC_TYPE u32
#define CRC_FUNC _mm_crc32_u32
#endif
/*
* Use the crc32 instruction from SSE4.2 to compute our checksum - same
* polynomial as the above function.
*/
static really_inline
u32 crc32c_sse42(u32 running_crc, const unsigned char* p_buf,
const size_t length) {
u32 crc = running_crc;
// Process byte-by-byte until p_buf is aligned
const unsigned char *aligned_buf = ROUNDUP_PTR(p_buf, CRC_WORD);
size_t init_bytes = aligned_buf - p_buf;
size_t running_length = ((length - init_bytes)/CRC_WORD)*CRC_WORD;
size_t end_bytes = length - init_bytes - running_length;
while (p_buf < aligned_buf) {
crc = _mm_crc32_u8(crc, *p_buf++);
}
// Main aligned loop, processes a word at a time.
for (size_t li = 0; li < running_length/CRC_WORD; li++) {
CRC_TYPE block = *(const CRC_TYPE *)p_buf;
crc = CRC_FUNC(crc, block);
p_buf += CRC_WORD;
}
// Remaining bytes
for(size_t li = 0; li < end_bytes; li++) {
crc = _mm_crc32_u8(crc, *p_buf++);
}
return crc;
}
#include "util/arch/x86/crc32.h"
#endif
#ifdef VERIFY_ASSERTION

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2015-2020, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -115,7 +115,8 @@ static
hs_error_t db_check_platform(const u64a p) {
if (p != hs_current_platform
&& p != (hs_current_platform | hs_current_platform_no_avx2)
&& p != (hs_current_platform | hs_current_platform_no_avx512)) {
&& p != (hs_current_platform | hs_current_platform_no_avx512)
&& p != (hs_current_platform | hs_current_platform_no_avx512vbmi)) {
return HS_DB_PLATFORM_ERROR;
}
// passed all checks
@ -352,12 +353,6 @@ hs_error_t dbIsValid(const hs_database_t *db) {
return HS_SUCCESS;
}
#if defined(_WIN32)
#define SNPRINTF_COMPAT _snprintf
#else
#define SNPRINTF_COMPAT snprintf
#endif
/** Allocate a buffer and prints the database info into it. Returns an
* appropriate error code on failure, or HS_SUCCESS on success. */
static
@ -370,9 +365,11 @@ hs_error_t print_database_string(char **s, u32 version, const platform_t plat,
u8 minor = (version >> 16) & 0xff;
u8 major = (version >> 24) & 0xff;
const char *features = (plat & HS_PLATFORM_NOAVX512)
? (plat & HS_PLATFORM_NOAVX2) ? "" : "AVX2"
: "AVX512";
const char *features = (plat & HS_PLATFORM_NOAVX512VBMI)
? (plat & HS_PLATFORM_NOAVX512)
? (plat & HS_PLATFORM_NOAVX2) ? "" : "AVX2"
: "AVX512"
: "AVX512VBMI";
const char *mode = NULL;
@ -397,9 +394,7 @@ hs_error_t print_database_string(char **s, u32 version, const platform_t plat,
return ret;
}
// Note: SNPRINTF_COMPAT is a macro defined above, to cope with systems
// that don't have snprintf but have a workalike.
int p_len = SNPRINTF_COMPAT(
int p_len = snprintf(
buf, len, "Version: %u.%u.%u Features: %s Mode: %s",
major, minor, release, features, mode);
if (p_len < 0) {

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2015-2020, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -51,10 +51,12 @@ extern "C"
// CPU type is the low 6 bits (we can't need more than 64, surely!)
#define HS_PLATFORM_INTEL 1
#define HS_PLATFORM_ARM 2
#define HS_PLATFORM_CPU_MASK 0x3F
#define HS_PLATFORM_NOAVX2 (4<<13)
#define HS_PLATFORM_NOAVX512 (8<<13)
#define HS_PLATFORM_NOAVX512VBMI (0x10<<13)
/** \brief Platform features bitmask. */
typedef u64a platform_t;
@ -66,6 +68,9 @@ const platform_t hs_current_platform = {
#endif
#if !defined(HAVE_AVX512)
HS_PLATFORM_NOAVX512 |
#endif
#if !defined(HAVE_AVX512VBMI)
HS_PLATFORM_NOAVX512VBMI |
#endif
0,
};
@ -74,13 +79,18 @@ static UNUSED
const platform_t hs_current_platform_no_avx2 = {
HS_PLATFORM_NOAVX2 |
HS_PLATFORM_NOAVX512 |
0,
HS_PLATFORM_NOAVX512VBMI
};
static UNUSED
const platform_t hs_current_platform_no_avx512 = {
HS_PLATFORM_NOAVX512 |
0,
HS_PLATFORM_NOAVX512VBMI
};
static UNUSED
const platform_t hs_current_platform_no_avx512vbmi = {
HS_PLATFORM_NOAVX512VBMI
};
/*
@ -102,6 +112,7 @@ struct hs_database {
static really_inline
const void *hs_get_bytecode(const struct hs_database *db) {
// cppcheck-suppress cstyleCast
return ((const char *)db + db->bytecode);
}

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2016-2017, Intel Corporation
* Copyright (c) 2016-2020, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -30,7 +31,41 @@
#include "hs_common.h"
#include "hs_runtime.h"
#include "ue2common.h"
#include "util/cpuid_inline.h"
/* Streamlining the dispatch to eliminate runtime checking/branching:
* What we want to do is, first call to the function will run the resolve
* code and set the static resolved/dispatch pointer to point to the
* correct function. Subsequent calls to the function will go directly to
* the resolved ptr. The simplest way to accomplish this is, to
* initially set the pointer to the resolve function.
* To accomplish this in a manner invisible to the user,
* we do involve some rather ugly/confusing macros in here.
* There are four macros that assemble the code for each function
* we want to dispatch in this manner:
* CREATE_DISPATCH
* this generates the declarations for the candidate target functions,
* for the fat_dispatch function pointer, for the resolve_ function,
* points the function pointer to the resolve function, and contains
* most of the definition of the resolve function. The very end of the
* resolve function is completed by the next macro, because in the
* CREATE_DISPATCH macro we have the argument list with the arg declarations,
* which is needed to generate correct function signatures, but we
* can't generate from this, in a macro, a _call_ to one of those functions.
* CONNECT_ARGS_1
* this macro fills in the actual call at the end of the resolve function,
* with the correct arg list. hence the name connect args.
* CONNECT_DISPATCH_2
* this macro likewise gives up the beginning of the definition of the
* actual entry point function (the 'real name' that's called by the user)
* but again in the pass-through call, cannot invoke the target without
* getting the arg list , which is supplied by the final macro,
* CONNECT_ARGS_3
*
*/
#if defined(ARCH_IA32) || defined(ARCH_X86_64)
#include "util/arch/x86/cpuid_inline.h"
#include "util/join.h"
#if defined(DISABLE_AVX512_DISPATCH)
@ -38,8 +73,14 @@
#define check_avx512() (0)
#endif
#if defined(DISABLE_AVX512VBMI_DISPATCH)
#define avx512vbmi_ disabled_
#define check_avx512vbmi() (0)
#endif
#define CREATE_DISPATCH(RTYPE, NAME, ...) \
/* create defns */ \
RTYPE JOIN(avx512vbmi_, NAME)(__VA_ARGS__); \
RTYPE JOIN(avx512_, NAME)(__VA_ARGS__); \
RTYPE JOIN(avx2_, NAME)(__VA_ARGS__); \
RTYPE JOIN(corei7_, NAME)(__VA_ARGS__); \
@ -50,93 +91,274 @@
return (RTYPE)HS_ARCH_ERROR; \
} \
\
/* dispatch routing pointer for this function */ \
/* initially point it at the resolve function */ \
static RTYPE JOIN(resolve_, NAME)(__VA_ARGS__); \
static RTYPE (* JOIN(fat_dispatch_, NAME))(__VA_ARGS__) = \
&JOIN(resolve_, NAME); \
\
/* resolver */ \
static RTYPE (*JOIN(resolve_, NAME)(void))(__VA_ARGS__) { \
if (check_avx512()) { \
return JOIN(avx512_, NAME); \
static RTYPE JOIN(resolve_, NAME)(__VA_ARGS__) { \
if (check_avx512vbmi()) { \
fat_dispatch_ ## NAME = &JOIN(avx512vbmi_, NAME); \
} \
if (check_avx2()) { \
return JOIN(avx2_, NAME); \
else if (check_avx512()) { \
fat_dispatch_ ## NAME = &JOIN(avx512_, NAME); \
} \
if (check_sse42() && check_popcnt()) { \
return JOIN(corei7_, NAME); \
else if (check_avx2()) { \
fat_dispatch_ ## NAME = &JOIN(avx2_, NAME); \
} \
if (check_ssse3()) { \
return JOIN(core2_, NAME); \
else if (check_sse42() && check_popcnt()) { \
fat_dispatch_ ## NAME = &JOIN(corei7_, NAME); \
} \
/* anything else is fail */ \
return JOIN(error_, NAME); \
else if (check_ssse3()) { \
fat_dispatch_ ## NAME = &JOIN(core2_, NAME); \
} else { \
/* anything else is fail */ \
fat_dispatch_ ## NAME = &JOIN(error_, NAME); \
} \
/* the rest of the function is completed in the CONNECT_ARGS_1 macro. */
#elif defined(ARCH_AARCH64)
#include "util/arch/arm/cpuid_inline.h"
#include "util/join.h"
#define CREATE_DISPATCH(RTYPE, NAME, ...) \
/* create defns */ \
RTYPE JOIN(sve2_, NAME)(__VA_ARGS__); \
RTYPE JOIN(sve_, NAME)(__VA_ARGS__); \
RTYPE JOIN(neon_, NAME)(__VA_ARGS__); \
\
/* error func */ \
static inline RTYPE JOIN(error_, NAME)(__VA_ARGS__) { \
return (RTYPE)HS_ARCH_ERROR; \
} \
\
/* function */ \
/* dispatch routing pointer for this function */ \
/* initially point it at the resolve function */ \
static RTYPE JOIN(resolve_, NAME)(__VA_ARGS__); \
static RTYPE (* JOIN(fat_dispatch_, NAME))(__VA_ARGS__) = \
&JOIN(resolve_, NAME); \
\
/* resolver */ \
static RTYPE JOIN(resolve_, NAME)(__VA_ARGS__) { \
if (check_sve2()) { \
fat_dispatch_ ## NAME = &JOIN(sve2_, NAME); \
} \
else if (check_sve()) { \
fat_dispatch_ ## NAME = &JOIN(sve_, NAME); \
} \
else if (check_neon()) { \
fat_dispatch_ ## NAME = &JOIN(neon_, NAME); \
} else { \
/* anything else is fail */ \
fat_dispatch_ ## NAME = &JOIN(error_, NAME); \
} \
/* the rest of the function is completed in the CONNECT_ARGS_1 macro. */
#endif
#define CONNECT_ARGS_1(RTYPE, NAME, ...) \
return (*fat_dispatch_ ## NAME)(__VA_ARGS__); \
} \
#define CONNECT_DISPATCH_2(RTYPE, NAME, ...) \
/* new function */ \
HS_PUBLIC_API \
RTYPE NAME(__VA_ARGS__) __attribute__((ifunc("resolve_" #NAME)))
RTYPE NAME(__VA_ARGS__) { \
#define CONNECT_ARGS_3(RTYPE, NAME, ...) \
return (*fat_dispatch_ ## NAME)(__VA_ARGS__); \
} \
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-parameter"
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-function"
/* this gets a bit ugly to compose the static redirect functions,
* as we necessarily need first the typed arg list and then just the arg
* names, twice in a row, to define the redirect function and the
* dispatch function call */
CREATE_DISPATCH(hs_error_t, hs_scan, const hs_database_t *db, const char *data,
unsigned length, unsigned flags, hs_scratch_t *scratch,
match_event_handler onEvent, void *userCtx);
CONNECT_ARGS_1(hs_error_t, hs_scan, db, data, length, flags, scratch, onEvent, userCtx);
CONNECT_DISPATCH_2(hs_error_t, hs_scan, const hs_database_t *db, const char *data,
unsigned length, unsigned flags, hs_scratch_t *scratch,
match_event_handler onEvent, void *userCtx);
CONNECT_ARGS_3(hs_error_t, hs_scan, db, data, length, flags, scratch, onEvent, userCtx);
CREATE_DISPATCH(hs_error_t, hs_stream_size, const hs_database_t *database,
size_t *stream_size);
CONNECT_ARGS_1(hs_error_t, hs_stream_size, database, stream_size);
CONNECT_DISPATCH_2(hs_error_t, hs_stream_size, const hs_database_t *database,
size_t *stream_size);
CONNECT_ARGS_3(hs_error_t, hs_stream_size, database, stream_size);
CREATE_DISPATCH(hs_error_t, hs_database_size, const hs_database_t *db,
size_t *size);
CONNECT_ARGS_1(hs_error_t, hs_database_size, db, size);
CONNECT_DISPATCH_2(hs_error_t, hs_database_size, const hs_database_t *db,
size_t *size);
CONNECT_ARGS_3(hs_error_t, hs_database_size, db, size);
CREATE_DISPATCH(hs_error_t, dbIsValid, const hs_database_t *db);
CONNECT_ARGS_1(hs_error_t, dbIsValid, db);
CONNECT_DISPATCH_2(hs_error_t, dbIsValid, const hs_database_t *db);
CONNECT_ARGS_3(hs_error_t, dbIsValid, db);
CREATE_DISPATCH(hs_error_t, hs_free_database, hs_database_t *db);
CONNECT_ARGS_1(hs_error_t, hs_free_database, db);
CONNECT_DISPATCH_2(hs_error_t, hs_free_database, hs_database_t *db);
CONNECT_ARGS_3(hs_error_t, hs_free_database, db);
CREATE_DISPATCH(hs_error_t, hs_open_stream, const hs_database_t *db,
unsigned int flags, hs_stream_t **stream);
CONNECT_ARGS_1(hs_error_t, hs_open_stream, db, flags, stream);
CONNECT_DISPATCH_2(hs_error_t, hs_open_stream, const hs_database_t *db,
unsigned int flags, hs_stream_t **stream);
CONNECT_ARGS_3(hs_error_t, hs_open_stream, db, flags, stream);
CREATE_DISPATCH(hs_error_t, hs_scan_stream, hs_stream_t *id, const char *data,
unsigned int length, unsigned int flags, hs_scratch_t *scratch,
match_event_handler onEvent, void *ctxt);
CONNECT_ARGS_1(hs_error_t, hs_scan_stream, id, data, length, flags, scratch, onEvent, ctxt);
CONNECT_DISPATCH_2(hs_error_t, hs_scan_stream, hs_stream_t *id, const char *data,
unsigned int length, unsigned int flags, hs_scratch_t *scratch,
match_event_handler onEvent, void *ctxt);
CONNECT_ARGS_3(hs_error_t, hs_scan_stream, id, data, length, flags, scratch, onEvent, ctxt);
CREATE_DISPATCH(hs_error_t, hs_close_stream, hs_stream_t *id,
hs_scratch_t *scratch, match_event_handler onEvent, void *ctxt);
CONNECT_ARGS_1(hs_error_t, hs_close_stream, id, scratch, onEvent, ctxt);
CONNECT_DISPATCH_2(hs_error_t, hs_close_stream, hs_stream_t *id,
hs_scratch_t *scratch, match_event_handler onEvent, void *ctxt);
CONNECT_ARGS_3(hs_error_t, hs_close_stream, id, scratch, onEvent, ctxt);
CREATE_DISPATCH(hs_error_t, hs_scan_vector, const hs_database_t *db,
const char *const *data, const unsigned int *length,
unsigned int count, unsigned int flags, hs_scratch_t *scratch,
match_event_handler onevent, void *context);
CONNECT_ARGS_1(hs_error_t, hs_scan_vector, db, data, length, count, flags, scratch, onevent, context);
CONNECT_DISPATCH_2(hs_error_t, hs_scan_vector, const hs_database_t *db,
const char *const *data, const unsigned int *length,
unsigned int count, unsigned int flags, hs_scratch_t *scratch,
match_event_handler onevent, void *context);
CONNECT_ARGS_3(hs_error_t, hs_scan_vector, db, data, length, count, flags, scratch, onevent, context);
CREATE_DISPATCH(hs_error_t, hs_database_info, const hs_database_t *db, char **info);
CONNECT_ARGS_1(hs_error_t, hs_database_info, db, info);
CONNECT_DISPATCH_2(hs_error_t, hs_database_info, const hs_database_t *db, char **info);
CONNECT_ARGS_3(hs_error_t, hs_database_info, db, info);
CREATE_DISPATCH(hs_error_t, hs_copy_stream, hs_stream_t **to_id,
const hs_stream_t *from_id);
CONNECT_ARGS_1(hs_error_t, hs_copy_stream, to_id, from_id);
CONNECT_DISPATCH_2(hs_error_t, hs_copy_stream, hs_stream_t **to_id,
const hs_stream_t *from_id);
CONNECT_ARGS_3(hs_error_t, hs_copy_stream, to_id, from_id);
CREATE_DISPATCH(hs_error_t, hs_reset_stream, hs_stream_t *id,
unsigned int flags, hs_scratch_t *scratch,
match_event_handler onEvent, void *context);
CONNECT_ARGS_1(hs_error_t, hs_reset_stream, id, flags, scratch, onEvent, context);
CONNECT_DISPATCH_2(hs_error_t, hs_reset_stream, hs_stream_t *id,
unsigned int flags, hs_scratch_t *scratch,
match_event_handler onEvent, void *context);
CONNECT_ARGS_3(hs_error_t, hs_reset_stream, id, flags, scratch, onEvent, context);
CREATE_DISPATCH(hs_error_t, hs_reset_and_copy_stream, hs_stream_t *to_id,
const hs_stream_t *from_id, hs_scratch_t *scratch,
match_event_handler onEvent, void *context);
CONNECT_ARGS_1(hs_error_t, hs_reset_and_copy_stream, to_id, from_id, scratch, onEvent, context);
CONNECT_DISPATCH_2(hs_error_t, hs_reset_and_copy_stream, hs_stream_t *to_id,
const hs_stream_t *from_id, hs_scratch_t *scratch,
match_event_handler onEvent, void *context);
CONNECT_ARGS_3(hs_error_t, hs_reset_and_copy_stream, to_id, from_id, scratch, onEvent, context);
CREATE_DISPATCH(hs_error_t, hs_serialize_database, const hs_database_t *db,
char **bytes, size_t *length);
CONNECT_ARGS_1(hs_error_t, hs_serialize_database, db, bytes, length);
CONNECT_DISPATCH_2(hs_error_t, hs_serialize_database, const hs_database_t *db,
char **bytes, size_t *length);
CONNECT_ARGS_3(hs_error_t, hs_serialize_database, db, bytes, length);
CREATE_DISPATCH(hs_error_t, hs_deserialize_database, const char *bytes,
const size_t length, hs_database_t **db);
CONNECT_ARGS_1(hs_error_t, hs_deserialize_database, bytes, length, db);
CONNECT_DISPATCH_2(hs_error_t, hs_deserialize_database, const char *bytes,
const size_t length, hs_database_t **db);
CONNECT_ARGS_3(hs_error_t, hs_deserialize_database, bytes, length, db);
CREATE_DISPATCH(hs_error_t, hs_deserialize_database_at, const char *bytes,
const size_t length, hs_database_t *db);
CONNECT_ARGS_1(hs_error_t, hs_deserialize_database_at, bytes, length, db);
CONNECT_DISPATCH_2(hs_error_t, hs_deserialize_database_at, const char *bytes,
const size_t length, hs_database_t *db);
CONNECT_ARGS_3(hs_error_t, hs_deserialize_database_at, bytes, length, db);
CREATE_DISPATCH(hs_error_t, hs_serialized_database_info, const char *bytes,
size_t length, char **info);
CONNECT_ARGS_1(hs_error_t, hs_serialized_database_info, bytes, length, info);
CONNECT_DISPATCH_2(hs_error_t, hs_serialized_database_info, const char *bytes,
size_t length, char **info);
CONNECT_ARGS_3(hs_error_t, hs_serialized_database_info, bytes, length, info);
CREATE_DISPATCH(hs_error_t, hs_serialized_database_size, const char *bytes,
const size_t length, size_t *deserialized_size);
CONNECT_ARGS_1(hs_error_t, hs_serialized_database_size, bytes, length, deserialized_size);
CONNECT_DISPATCH_2(hs_error_t, hs_serialized_database_size, const char *bytes,
const size_t length, size_t *deserialized_size);
CONNECT_ARGS_3(hs_error_t, hs_serialized_database_size, bytes, length, deserialized_size);
CREATE_DISPATCH(hs_error_t, hs_compress_stream, const hs_stream_t *stream,
char *buf, size_t buf_space, size_t *used_space);
CONNECT_ARGS_1(hs_error_t, hs_compress_stream, stream,
buf, buf_space, used_space);
CONNECT_DISPATCH_2(hs_error_t, hs_compress_stream, const hs_stream_t *stream,
char *buf, size_t buf_space, size_t *used_space);
CONNECT_ARGS_3(hs_error_t, hs_compress_stream, stream,
buf, buf_space, used_space);
CREATE_DISPATCH(hs_error_t, hs_expand_stream, const hs_database_t *db,
hs_stream_t **stream, const char *buf,size_t buf_size);
CONNECT_ARGS_1(hs_error_t, hs_expand_stream, db, stream, buf,buf_size);
CONNECT_DISPATCH_2(hs_error_t, hs_expand_stream, const hs_database_t *db,
hs_stream_t **stream, const char *buf,size_t buf_size);
CONNECT_ARGS_3(hs_error_t, hs_expand_stream, db, stream, buf,buf_size);
CREATE_DISPATCH(hs_error_t, hs_reset_and_expand_stream, hs_stream_t *to_stream,
const char *buf, size_t buf_size, hs_scratch_t *scratch,
match_event_handler onEvent, void *context);
CONNECT_ARGS_1(hs_error_t, hs_reset_and_expand_stream, to_stream,
buf, buf_size, scratch, onEvent, context);
CONNECT_DISPATCH_2(hs_error_t, hs_reset_and_expand_stream, hs_stream_t *to_stream,
const char *buf, size_t buf_size, hs_scratch_t *scratch,
match_event_handler onEvent, void *context);
CONNECT_ARGS_3(hs_error_t, hs_reset_and_expand_stream, to_stream,
buf, buf_size, scratch, onEvent, context);
/** INTERNALS **/
CREATE_DISPATCH(u32, Crc32c_ComputeBuf, u32 inCrc32, const void *buf, size_t bufLen);
CONNECT_ARGS_1(u32, Crc32c_ComputeBuf, inCrc32, buf, bufLen);
CONNECT_DISPATCH_2(u32, Crc32c_ComputeBuf, u32 inCrc32, const void *buf, size_t bufLen);
CONNECT_ARGS_3(u32, Crc32c_ComputeBuf, inCrc32, buf, bufLen);
#pragma GCC diagnostic pop
#pragma GCC diagnostic pop

View File

@ -36,6 +36,7 @@
#include "teddy.h"
#include "teddy_internal.h"
#include "util/arch.h"
#include "util/bitutils.h"
#include "util/simd_utils.h"
#include "util/uniform_ops.h"
@ -119,20 +120,6 @@ const ALIGN_CL_DIRECTIVE u8 zone_or_mask[ITER_BYTES+1][ITER_BYTES] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }
};
/* compilers don't reliably synthesize the 32-bit ANDN instruction here,
* so we force its generation.
*/
static really_inline
u64a andn(const u32 a, const u8 *b) {
u64a r;
#if defined(HAVE_BMI) && !defined(NO_ASM)
__asm__ ("andn\t%2,%1,%k0" : "=r"(r) : "r"(a), "m"(*(const u32 *)b));
#else
r = unaligned_load_u32(b) & ~a;
#endif
return r;
}
/* generates an initial state mask based on the last byte-ish of history rather
* than being all accepting. If there is no history to consider, the state is
* generated based on the minimum length of each bucket in order to prevent
@ -160,33 +147,43 @@ void get_conf_stride_1(const u8 *itPtr, UNUSED const u8 *start_ptr,
const u64a *ft, u64a *conf0, u64a *conf8, m128 *s) {
/* +1: the zones ensure that we can read the byte at z->end */
assert(itPtr >= start_ptr && itPtr + ITER_BYTES <= end_ptr);
u64a reach0 = andn(domain_mask_flipped, itPtr);
u64a reach1 = andn(domain_mask_flipped, itPtr + 1);
u64a reach2 = andn(domain_mask_flipped, itPtr + 2);
u64a reach3 = andn(domain_mask_flipped, itPtr + 3);
u64a domain_mask = ~domain_mask_flipped;
m128 st0 = load_m128_from_u64a(ft + reach0);
m128 st1 = load_m128_from_u64a(ft + reach1);
m128 st2 = load_m128_from_u64a(ft + reach2);
m128 st3 = load_m128_from_u64a(ft + reach3);
u64a it_hi = *(const u64a *)itPtr;
u64a it_lo = *(const u64a *)(itPtr + 8);
u64a reach0 = domain_mask & it_hi;
u64a reach1 = domain_mask & (it_hi >> 8);
u64a reach2 = domain_mask & (it_hi >> 16);
u64a reach3 = domain_mask & (it_hi >> 24);
u64a reach4 = domain_mask & (it_hi >> 32);
u64a reach5 = domain_mask & (it_hi >> 40);
u64a reach6 = domain_mask & (it_hi >> 48);
u64a reach7 = domain_mask & ((it_hi >> 56) | (it_lo << 8));
u64a reach8 = domain_mask & it_lo;
u64a reach9 = domain_mask & (it_lo >> 8);
u64a reach10 = domain_mask & (it_lo >> 16);
u64a reach11 = domain_mask & (it_lo >> 24);
u64a reach12 = domain_mask & (it_lo >> 32);
u64a reach13 = domain_mask & (it_lo >> 40);
u64a reach14 = domain_mask & (it_lo >> 48);
u64a reach15 = domain_mask & unaligned_load_u32(itPtr + 15);
u64a reach4 = andn(domain_mask_flipped, itPtr + 4);
u64a reach5 = andn(domain_mask_flipped, itPtr + 5);
u64a reach6 = andn(domain_mask_flipped, itPtr + 6);
u64a reach7 = andn(domain_mask_flipped, itPtr + 7);
m128 st4 = load_m128_from_u64a(ft + reach4);
m128 st5 = load_m128_from_u64a(ft + reach5);
m128 st6 = load_m128_from_u64a(ft + reach6);
m128 st7 = load_m128_from_u64a(ft + reach7);
st1 = lshiftbyte_m128(st1, 1);
st2 = lshiftbyte_m128(st2, 2);
st3 = lshiftbyte_m128(st3, 3);
st4 = lshiftbyte_m128(st4, 4);
st5 = lshiftbyte_m128(st5, 5);
st6 = lshiftbyte_m128(st6, 6);
st7 = lshiftbyte_m128(st7, 7);
m128 st0 = load_m128_from_u64a(ft + reach0);
m128 st1 = lshiftbyte_m128(load_m128_from_u64a(ft + reach1), 1);
m128 st2 = lshiftbyte_m128(load_m128_from_u64a(ft + reach2), 2);
m128 st3 = lshiftbyte_m128(load_m128_from_u64a(ft + reach3), 3);
m128 st4 = lshiftbyte_m128(load_m128_from_u64a(ft + reach4), 4);
m128 st5 = lshiftbyte_m128(load_m128_from_u64a(ft + reach5), 5);
m128 st6 = lshiftbyte_m128(load_m128_from_u64a(ft + reach6), 6);
m128 st7 = lshiftbyte_m128(load_m128_from_u64a(ft + reach7), 7);
m128 st8 = load_m128_from_u64a(ft + reach8);
m128 st9 = lshiftbyte_m128(load_m128_from_u64a(ft + reach9), 1);
m128 st10 = lshiftbyte_m128(load_m128_from_u64a(ft + reach10), 2);
m128 st11 = lshiftbyte_m128(load_m128_from_u64a(ft + reach11), 3);
m128 st12 = lshiftbyte_m128(load_m128_from_u64a(ft + reach12), 4);
m128 st13 = lshiftbyte_m128(load_m128_from_u64a(ft + reach13), 5);
m128 st14 = lshiftbyte_m128(load_m128_from_u64a(ft + reach14), 6);
m128 st15 = lshiftbyte_m128(load_m128_from_u64a(ft + reach15), 7);
st0 = or128(st0, st1);
st2 = or128(st2, st3);
@ -195,39 +192,6 @@ void get_conf_stride_1(const u8 *itPtr, UNUSED const u8 *start_ptr,
st0 = or128(st0, st2);
st4 = or128(st4, st6);
st0 = or128(st0, st4);
*s = or128(*s, st0);
*conf0 = movq(*s);
*s = rshiftbyte_m128(*s, 8);
*conf0 ^= ~0ULL;
u64a reach8 = andn(domain_mask_flipped, itPtr + 8);
u64a reach9 = andn(domain_mask_flipped, itPtr + 9);
u64a reach10 = andn(domain_mask_flipped, itPtr + 10);
u64a reach11 = andn(domain_mask_flipped, itPtr + 11);
m128 st8 = load_m128_from_u64a(ft + reach8);
m128 st9 = load_m128_from_u64a(ft + reach9);
m128 st10 = load_m128_from_u64a(ft + reach10);
m128 st11 = load_m128_from_u64a(ft + reach11);
u64a reach12 = andn(domain_mask_flipped, itPtr + 12);
u64a reach13 = andn(domain_mask_flipped, itPtr + 13);
u64a reach14 = andn(domain_mask_flipped, itPtr + 14);
u64a reach15 = andn(domain_mask_flipped, itPtr + 15);
m128 st12 = load_m128_from_u64a(ft + reach12);
m128 st13 = load_m128_from_u64a(ft + reach13);
m128 st14 = load_m128_from_u64a(ft + reach14);
m128 st15 = load_m128_from_u64a(ft + reach15);
st9 = lshiftbyte_m128(st9, 1);
st10 = lshiftbyte_m128(st10, 2);
st11 = lshiftbyte_m128(st11, 3);
st12 = lshiftbyte_m128(st12, 4);
st13 = lshiftbyte_m128(st13, 5);
st14 = lshiftbyte_m128(st14, 6);
st15 = lshiftbyte_m128(st15, 7);
st8 = or128(st8, st9);
st10 = or128(st10, st11);
@ -236,11 +200,14 @@ void get_conf_stride_1(const u8 *itPtr, UNUSED const u8 *start_ptr,
st8 = or128(st8, st10);
st12 = or128(st12, st14);
st8 = or128(st8, st12);
*s = or128(*s, st8);
*conf8 = movq(*s);
*s = rshiftbyte_m128(*s, 8);
*conf8 ^= ~0ULL;
m128 st = or128(*s, st0);
*conf0 = movq(st) ^ ~0ULL;
st = rshiftbyte_m128(st, 8);
st = or128(st, st8);
*conf8 = movq(st) ^ ~0ULL;
*s = rshiftbyte_m128(st, 8);
}
static really_inline
@ -248,6 +215,7 @@ void get_conf_stride_2(const u8 *itPtr, UNUSED const u8 *start_ptr,
UNUSED const u8 *end_ptr, u32 domain_mask_flipped,
const u64a *ft, u64a *conf0, u64a *conf8, m128 *s) {
assert(itPtr >= start_ptr && itPtr + ITER_BYTES <= end_ptr);
u64a reach0 = andn(domain_mask_flipped, itPtr);
u64a reach2 = andn(domain_mask_flipped, itPtr + 2);
u64a reach4 = andn(domain_mask_flipped, itPtr + 4);
@ -300,6 +268,7 @@ void get_conf_stride_4(const u8 *itPtr, UNUSED const u8 *start_ptr,
UNUSED const u8 *end_ptr, u32 domain_mask_flipped,
const u64a *ft, u64a *conf0, u64a *conf8, m128 *s) {
assert(itPtr >= start_ptr && itPtr + ITER_BYTES <= end_ptr);
u64a reach0 = andn(domain_mask_flipped, itPtr);
u64a reach4 = andn(domain_mask_flipped, itPtr + 4);
u64a reach8 = andn(domain_mask_flipped, itPtr + 8);
@ -329,7 +298,7 @@ void get_conf_stride_4(const u8 *itPtr, UNUSED const u8 *start_ptr,
static really_inline
void do_confirm_fdr(u64a *conf, u8 offset, hwlmcb_rv_t *control,
const u32 *confBase, const struct FDR_Runtime_Args *a,
const u8 *ptr, u32 *last_match_id, struct zone *z) {
const u8 *ptr, u32 *last_match_id, const struct zone *z) {
const u8 bucket = 8;
if (likely(!*conf)) {
@ -339,7 +308,7 @@ void do_confirm_fdr(u64a *conf, u8 offset, hwlmcb_rv_t *control,
/* ptr is currently referring to a location in the zone's buffer, we also
* need a pointer in the original, main buffer for the final string compare.
*/
const u8 *ptr_main = (const u8 *)((uintptr_t)ptr + z->zone_pointer_adjust);
const u8 *ptr_main = (const u8 *)((uintptr_t)ptr + z->zone_pointer_adjust); //NOLINT (performance-no-int-to-ptr)
const u8 *confLoc = ptr;
@ -364,7 +333,7 @@ void do_confirm_fdr(u64a *conf, u8 offset, hwlmcb_rv_t *control,
}
static really_inline
void dumpZoneInfo(UNUSED struct zone *z, UNUSED size_t zone_id) {
void dumpZoneInfo(UNUSED const struct zone *z, UNUSED size_t zone_id) {
#ifdef DEBUG
DEBUG_PRINTF("zone: zone=%zu, bufPtr=%p\n", zone_id, z->buf);
DEBUG_PRINTF("zone: startPtr=%p, endPtr=%p, shift=%u\n",
@ -696,6 +665,10 @@ size_t prepareZones(const u8 *buf, size_t len, const u8 *hend,
const u8 *tryFloodDetect = zz->floodPtr; \
const u8 *start_ptr = zz->start; \
const u8 *end_ptr = zz->end; \
for (const u8 *itPtr = ROUNDDOWN_PTR(start_ptr, 64); itPtr + 4*ITER_BYTES <= end_ptr; \
itPtr += 4*ITER_BYTES) { \
__builtin_prefetch(itPtr); \
} \
\
for (const u8 *itPtr = start_ptr; itPtr + ITER_BYTES <= end_ptr; \
itPtr += ITER_BYTES) { \
@ -739,6 +712,7 @@ hwlm_error_t fdr_engine_exec(const struct FDR *fdr,
assert(ISALIGNED_CL(confBase));
struct zone zones[ZONE_MAX];
assert(fdr->domain > 8 && fdr->domain < 16);
memset(zones, 0, sizeof(zones));
size_t numZone = prepareZones(a->buf, a->len,
a->buf_history + a->len_history,

View File

@ -44,7 +44,6 @@
#include "util/compare.h"
#include "util/container.h"
#include "util/dump_mask.h"
#include "util/make_unique.h"
#include "util/math.h"
#include "util/noncopyable.h"
#include "util/target_info.h"
@ -99,7 +98,7 @@ public:
const FDREngineDescription &eng_in,
bool make_small_in, const Grey &grey_in)
: eng(eng_in), grey(grey_in), tab(eng_in.getTabSizeBytes()),
lits(move(lits_in)), bucketToLits(move(bucketToLits_in)),
lits(std::move(lits_in)), bucketToLits(std::move(bucketToLits_in)),
make_small(make_small_in) {}
bytecode_ptr<FDR> build();
@ -128,7 +127,7 @@ void andMask(u8 *dest, const u8 *a, const u8 *b, u32 num_bytes) {
}
void FDRCompiler::createInitialState(FDR *fdr) {
u8 *start = (u8 *)&fdr->start;
u8 *start = reinterpret_cast<u8 *>(&fdr->start);
/* initial state should to be 1 in each slot in the bucket up to bucket
* minlen - 1, and 0 thereafter */
@ -136,8 +135,9 @@ void FDRCompiler::createInitialState(FDR *fdr) {
// Find the minimum length for the literals in this bucket.
const vector<LiteralIndex> &bucket_lits = bucketToLits[b];
u32 min_len = ~0U;
for (const LiteralIndex &lit_idx : bucket_lits) {
min_len = min(min_len, verify_u32(lits[lit_idx].s.length()));
for (const LiteralIndex &lit_idx : bucket_lits) {
// cppcheck-suppress useStlAlgorithm
min_len = min(min_len, verify_u32(lits[lit_idx].s.length()));
}
DEBUG_PRINTF("bucket %u has min_len=%u\n", b, min_len);
@ -176,7 +176,7 @@ bytecode_ptr<FDR> FDRCompiler::setupFDR() {
auto fdr = make_zeroed_bytecode_ptr<FDR>(size, 64);
assert(fdr); // otherwise would have thrown std::bad_alloc
u8 *fdr_base = (u8 *)fdr.get();
u8 *fdr_base = reinterpret_cast<u8 *>(fdr.get());
// Write header.
fdr->size = size;
@ -206,8 +206,7 @@ bytecode_ptr<FDR> FDRCompiler::setupFDR() {
assert(ISALIGNED_CL(ptr));
fdr->floodOffset = verify_u32(ptr - fdr_base);
memcpy(ptr, floodTable.get(), floodTable.size());
ptr += floodTable.size(); // last write, no need to round up
return fdr;
}
@ -494,18 +493,18 @@ map<BucketIndex, vector<LiteralIndex>> assignStringsToBuckets(
u32 cnt = last_id - first_id;
// long literals first for included literals checking
for (u32 k = 0; k < cnt; k++) {
litIds.push_back(last_id - k - 1);
litIds.emplace_back(last_id - k - 1);
}
i = j;
buckets.push_back(litIds);
buckets.emplace_back(litIds);
}
// reverse bucket id, longer literals come first
map<BucketIndex, vector<LiteralIndex>> bucketToLits;
size_t bucketCnt = buckets.size();
for (size_t i = 0; i < bucketCnt; i++) {
bucketToLits.emplace(bucketCnt - i - 1, move(buckets[i]));
bucketToLits.emplace(bucketCnt - i - 1, std::move(buckets[i]));
}
return bucketToLits;
@ -868,7 +867,7 @@ unique_ptr<HWLMProto> fdrBuildProtoInternal(u8 engType,
auto bucketToLits = assignStringsToBuckets(lits, *des);
addIncludedInfo(lits, des->getNumBuckets(), bucketToLits);
auto proto =
ue2::make_unique<HWLMProto>(engType, move(des), lits, bucketToLits,
std::make_unique<HWLMProto>(engType, std::move(des), lits, bucketToLits,
make_small);
return proto;
}

View File

@ -39,6 +39,7 @@ namespace ue2 {
size_t maxLen(const vector<hwlmLiteral> &lits) {
size_t rv = 0;
for (const auto &lit : lits) {
// cppcheck-suppress useStlAlgorithm
rv = max(rv, lit.s.size());
}
return rv;

View File

@ -84,9 +84,10 @@ struct FDRConfirm {
static really_inline
const u32 *getConfirmLitIndex(const struct FDRConfirm *fdrc) {
// cppcheck-suppress cstyleCast
const u8 *base = (const u8 *)fdrc;
const u32 *litIndex =
(const u32 *)(base + ROUNDUP_N(sizeof(*fdrc), alignof(u32)));
// cppcheck-suppress cstyleCast
const u32 *litIndex =(const u32 *)(base + ROUNDUP_N(sizeof(*fdrc), alignof(u32)));
assert(ISALIGNED(litIndex));
return litIndex;
}

View File

@ -58,7 +58,7 @@ u64a make_u64a_mask(const vector<u8> &v) {
u64a mask = 0;
size_t vlen = v.size();
size_t len = std::min(vlen, sizeof(mask));
unsigned char *m = (unsigned char *)&mask;
u8 *m = reinterpret_cast<u8 *>(&mask);
memcpy(m + sizeof(mask) - len, &v[vlen - len], len);
return mask;
}
@ -159,10 +159,10 @@ bytecode_ptr<FDRConfirm> getFDRConfirm(const vector<hwlmLiteral> &lits,
map<u32, vector<LiteralIndex> > res2lits;
hwlm_group_t gm = 0;
for (LiteralIndex i = 0; i < lits.size(); i++) {
LitInfo & li = tmpLitInfo[i];
const LitInfo & li = tmpLitInfo[i];
u32 hash = CONF_HASH_CALL(li.v, andmsk, mult, nBits);
DEBUG_PRINTF("%016llx --> %u\n", li.v, hash);
res2lits[hash].push_back(i);
res2lits[hash].emplace_back(i);
gm |= li.groups;
}
@ -245,10 +245,10 @@ bytecode_ptr<FDRConfirm> getFDRConfirm(const vector<hwlmLiteral> &lits,
fdrc->groups = gm;
// After the FDRConfirm, we have the lit index array.
u8 *fdrc_base = (u8 *)fdrc.get();
u8 *fdrc_base = reinterpret_cast<u8 *>(fdrc.get());
u8 *ptr = fdrc_base + sizeof(*fdrc);
ptr = ROUNDUP_PTR(ptr, alignof(u32));
u32 *bitsToLitIndex = (u32 *)ptr;
u32 *bitsToLitIndex = reinterpret_cast<u32 *>(ptr);
ptr += bitsToLitIndexSize;
// After the lit index array, we have the LitInfo structures themselves,
@ -265,7 +265,7 @@ bytecode_ptr<FDRConfirm> getFDRConfirm(const vector<hwlmLiteral> &lits,
LiteralIndex litIdx = *i;
// Write LitInfo header.
LitInfo &finalLI = *(LitInfo *)ptr;
LitInfo &finalLI = *(reinterpret_cast<LitInfo *>(ptr));
finalLI = tmpLitInfo[litIdx];
ptr += sizeof(LitInfo); // String starts directly after LitInfo.
@ -294,22 +294,20 @@ setupFullConfs(const vector<hwlmLiteral> &lits,
const EngineDescription &eng,
const map<BucketIndex, vector<LiteralIndex>> &bucketToLits,
bool make_small) {
unique_ptr<TeddyEngineDescription> teddyDescr =
getTeddyDescription(eng.getID());
BC2CONF bc2Conf;
u32 totalConfirmSize = 0;
for (BucketIndex b = 0; b < eng.getNumBuckets(); b++) {
if (contains(bucketToLits, b)) {
vector<hwlmLiteral> vl;
for (const LiteralIndex &lit_idx : bucketToLits.at(b)) {
vl.push_back(lits[lit_idx]);
// cppcheck-suppress useStlAlgorithm
vl.emplace_back(lits[lit_idx]);
}
DEBUG_PRINTF("b %d sz %zu\n", b, vl.size());
auto fc = getFDRConfirm(vl, make_small);
totalConfirmSize += fc.size();
bc2Conf.emplace(b, move(fc));
bc2Conf.emplace(b, std::move(fc));
}
}
@ -320,7 +318,7 @@ setupFullConfs(const vector<hwlmLiteral> &lits,
auto buf = make_zeroed_bytecode_ptr<u8>(totalSize, 64);
assert(buf); // otherwise would have thrown std::bad_alloc
u32 *confBase = (u32 *)buf.get();
u32 *confBase = reinterpret_cast<u32 *>(buf.get());
u8 *ptr = buf.get() + totalConfSwitchSize;
assert(ISALIGNED_CL(ptr));

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2019, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -54,9 +55,14 @@ void confWithBit(const struct FDRConfirm *fdrc, const struct FDR_Runtime_Args *a
if (likely(!start)) {
return;
}
// these cplusplus checks are needed because this is included in both fdr.c and teddy.cpp
#ifdef __cplusplus
const struct LitInfo *li
= reinterpret_cast<const struct LitInfo *>(reinterpret_cast<const u8 *>(fdrc) + start);
#else
const struct LitInfo *li
= (const struct LitInfo *)((const u8 *)fdrc + start);
#endif
struct hs_scratch *scratch = a->scratch;
assert(!scratch->fdr_conf);
@ -74,18 +80,20 @@ void confWithBit(const struct FDRConfirm *fdrc, const struct FDR_Runtime_Args *a
goto out;
}
const u8 *loc = buf + i - li->size + 1;
do{ // this do while is to block off the line below from the goto
const u8 *loc = buf + i - li->size + 1;
if (loc < buf) {
u32 full_overhang = buf - loc;
size_t len_history = a->len_history;
if (loc < buf) {
u32 full_overhang = buf - loc;
size_t len_history = a->len_history;
// can't do a vectored confirm either if we don't have
// the bytes
if (full_overhang > len_history) {
goto out;
// can't do a vectored confirm either if we don't have
// the bytes
if (full_overhang > len_history) {
goto out;
}
}
}
}while(0);
assert(li->size <= sizeof(CONF_TYPE));
if (unlikely(!(li->groups & *control))) {

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2015-2020, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -74,9 +74,9 @@ void dumpLitIndex(const FDRConfirm *fdrc, FILE *f) {
static
void dumpConfirms(const void *fdr_base, u32 conf_offset, u32 num_confirms,
FILE *f) {
const u32 *conf = (const u32 *)((const char *)fdr_base + conf_offset);
const u32 *conf = reinterpret_cast<const u32 *>(reinterpret_cast<const char *>(fdr_base) + conf_offset);
for (u32 i = 0; i < num_confirms; i++) {
const auto *fdrc = (const FDRConfirm *)((const char *)conf + conf[i]);
const auto *fdrc = reinterpret_cast<const FDRConfirm *>(reinterpret_cast<const char *>(conf) + conf[i]);
fprintf(f, " confirm %u\n", i);
fprintf(f, " andmsk 0x%016llx\n", fdrc->andmsk);
fprintf(f, " mult 0x%016llx\n", fdrc->mult);
@ -107,12 +107,31 @@ void dumpTeddyReinforced(const u8 *rmsk, const u32 num_tables, FILE *f) {
}
}
static
void dumpTeddyDupMasks(const u8 *dmsk, u32 numMasks, FILE *f) {
// dump nibble masks
u32 maskWidth = 2;
fprintf(f, " dup nibble masks:\n");
for (u32 i = 0; i < numMasks * 2; i++) {
fprintf(f, " -%u%s: ", 1 + i / 2, (i % 2) ? "hi" : "lo");
for (u32 j = 0; j < 16 * maskWidth * 2; j++) {
u8 val = dmsk[i * 16 * maskWidth * 2 + j];
for (u32 k = 0; k < 8; k++) {
fprintf(f, "%s", ((val >> k) & 0x1) ? "1" : "0");
}
fprintf(f, " ");
}
fprintf(f, "\n");
}
fprintf(f, "\n");
}
static
void dumpTeddyMasks(const u8 *baseMsk, u32 numMasks, u32 maskWidth, FILE *f) {
// dump nibble masks
fprintf(f, " nibble masks:\n");
for (u32 i = 0; i < numMasks * 2; i++) {
fprintf(f, " -%d%s: ", 1 + i / 2, (i % 2) ? "hi" : "lo");
fprintf(f, " -%u%s: ", 1 + i / 2, (i % 2) ? "hi" : "lo");
for (u32 j = 0; j < 16 * maskWidth; j++) {
u8 val = baseMsk[i * 16 * maskWidth + j];
for (u32 k = 0; k < 8; k++) {
@ -138,7 +157,7 @@ void dumpTeddy(const Teddy *teddy, FILE *f) {
fprintf(f, " buckets %u\n", des->getNumBuckets());
fprintf(f, " packed %s\n", des->packed ? "true" : "false");
fprintf(f, " strings %u\n", teddy->numStrings);
fprintf(f, " size %zu bytes\n", fdrSize((const FDR *)teddy));
fprintf(f, " size %zu bytes\n", fdrSize(reinterpret_cast<const FDR *>(teddy)));
fprintf(f, " max length %u\n", teddy->maxStringLen);
fprintf(f, " floodoff %u (%x)\n", teddy->floodOffset,
teddy->floodOffset);
@ -146,12 +165,17 @@ void dumpTeddy(const Teddy *teddy, FILE *f) {
u32 maskWidth = des->getNumBuckets() / 8;
size_t headerSize = sizeof(Teddy);
size_t maskLen = des->numMasks * 16 * 2 * maskWidth;
const u8 *teddy_base = (const u8 *)teddy;
const u8 *teddy_base = reinterpret_cast<const u8 *>(teddy);
const u8 *baseMsk = teddy_base + ROUNDUP_CL(headerSize);
const u8 *rmsk = baseMsk + ROUNDUP_CL(maskLen);
dumpTeddyMasks(baseMsk, des->numMasks, maskWidth, f);
dumpTeddyReinforced(rmsk, maskWidth, f);
size_t maskLen = des->numMasks * 16 * 2 * maskWidth;
const u8 *rdmsk = baseMsk + ROUNDUP_CL(maskLen);
if (maskWidth == 1) { // reinforcement table in Teddy
dumpTeddyReinforced(rdmsk, maskWidth, f);
} else { // dup nibble mask table in Fat Teddy
assert(maskWidth == 2);
dumpTeddyDupMasks(rdmsk, des->numMasks, f);
}
dumpConfirms(teddy, teddy->confOffset, des->getNumBuckets(), f);
}
@ -177,7 +201,7 @@ void dumpFDR(const FDR *fdr, FILE *f) {
void fdrPrintStats(const FDR *fdr, FILE *f) {
if (fdrIsTeddy(fdr)) {
dumpTeddy((const Teddy *)fdr, f);
dumpTeddy(reinterpret_cast<const Teddy *>(fdr), f);
} else {
dumpFDR(fdr, f);
}

View File

@ -31,7 +31,6 @@
#include "hs_compile.h"
#include "util/target_info.h"
#include "util/compare.h" // for ourisalpha()
#include "util/make_unique.h"
#include <cassert>
#include <cstdlib>
@ -72,7 +71,7 @@ u32 findDesiredStride(size_t num_lits, size_t min_len, size_t min_len_count) {
} else if (num_lits < 5000) {
// for larger but not huge sizes, go to stride 2 only if we have at
// least minlen 3
desiredStride = MIN(min_len - 1, 2);
desiredStride = std::min(min_len - 1, 2UL);
}
}
@ -196,7 +195,7 @@ unique_ptr<FDREngineDescription> chooseEngine(const target_t &target,
}
DEBUG_PRINTF("using engine %u\n", best->getID());
return ue2::make_unique<FDREngineDescription>(*best);
return std::make_unique<FDREngineDescription>(*best);
}
SchemeBitIndex FDREngineDescription::getSchemeBit(BucketIndex b,
@ -222,7 +221,7 @@ unique_ptr<FDREngineDescription> getFdrDescription(u32 engineID) {
return nullptr;
}
return ue2::make_unique<FDREngineDescription>(allDescs[engineID]);
return std::make_unique<FDREngineDescription>(allDescs[engineID]);
}
} // namespace ue2

View File

@ -208,8 +208,8 @@ bytecode_ptr<u8> setupFDRFloodControl(const vector<hwlmLiteral> &lits,
auto buf = make_zeroed_bytecode_ptr<u8>(totalSize, 16);
assert(buf); // otherwise would have thrown std::bad_alloc
u32 *floodHeader = (u32 *)buf.get();
FDRFlood *layoutFlood = (FDRFlood *)(buf.get() + floodHeaderSize);
u32 *floodHeader = reinterpret_cast<u32 *>(buf.get());
FDRFlood *layoutFlood = reinterpret_cast<FDRFlood *>(buf.get() + floodHeaderSize);
u32 currentFloodIndex = 0;
for (const auto &m : flood2chars) {

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -37,6 +38,13 @@
#define FLOOD_MINIMUM_SIZE 256
#define FLOOD_BACKOFF_START 32
// this is because this file is included in both fdr.c and teddy.cpp
#if defined __cplusplus
#define CU64A_P_CAST(X) reinterpret_cast<const u64a*>(X)
#else
#define CU64A_P_CAST(X) (const u64a *)(X)
#endif
static really_inline
const u8 * nextFloodDetect(const u8 * buf, size_t len, u32 floodBackoff) {
// if we don't have a flood at either the start or end,
@ -47,18 +55,18 @@ const u8 * nextFloodDetect(const u8 * buf, size_t len, u32 floodBackoff) {
/* entry points in runtime.c prefetch relevant data */
#ifndef FLOOD_32
u64a x11 = *(const u64a *)ROUNDUP_PTR(buf, 8);
u64a x12 = *(const u64a *)ROUNDUP_PTR(buf+8, 8);
u64a x11 = *CU64A_P_CAST(ROUNDUP_PTR(buf, 8));
u64a x12 = *CU64A_P_CAST(ROUNDUP_PTR(buf+8, 8));
if (x11 == x12) {
return buf + floodBackoff;
}
u64a x21 = *(const u64a *)ROUNDUP_PTR(buf + len/2, 8);
u64a x22 = *(const u64a *)ROUNDUP_PTR(buf + len/2 + 8, 8);
u64a x21 = *CU64A_P_CAST(ROUNDUP_PTR(buf + len/2, 8));
u64a x22 = *CU64A_P_CAST(ROUNDUP_PTR(buf + len/2 + 8, 8));
if (x21 == x22) {
return buf + floodBackoff;
}
u64a x31 = *(const u64a *)ROUNDUP_PTR(buf + len - 24, 8);
u64a x32 = *(const u64a *)ROUNDUP_PTR(buf + len - 16, 8);
u64a x31 = *CU64A_P_CAST(ROUNDUP_PTR(buf + len - 24, 8));
u64a x32 = *CU64A_P_CAST(ROUNDUP_PTR(buf + len - 16, 8));
if (x31 == x32) {
return buf + floodBackoff;
}
@ -106,9 +114,15 @@ const u8 * floodDetect(const struct FDR * fdr,
// go from c to our FDRFlood structure
u8 c = buf[i];
#ifdef __cplusplus
const u8 * fBase = (reinterpret_cast<const u8 *>(fdr)) + fdr->floodOffset;
u32 fIdx = (reinterpret_cast<const u32 *>(fBase))[c];
const struct FDRFlood * fsb = reinterpret_cast<const struct FDRFlood *>(fBase + sizeof(u32) * 256);
#else
const u8 * fBase = ((const u8 *)fdr) + fdr->floodOffset;
u32 fIdx = ((const u32 *)fBase)[c];
const struct FDRFlood * fsb = (const struct FDRFlood *)(fBase + sizeof(u32) * 256);
#endif
const struct FDRFlood * fl = &fsb[fIdx];
#ifndef FLOOD_32
@ -116,7 +130,7 @@ const u8 * floodDetect(const struct FDR * fdr,
cmpVal |= cmpVal << 8;
cmpVal |= cmpVal << 16;
cmpVal |= cmpVal << 32;
u64a probe = *(const u64a *)ROUNDUP_PTR(buf+i, 8);
u64a probe = *CU64A_P_CAST(ROUNDUP_PTR(buf+i, 8));
#else
u32 cmpVal = c;
cmpVal |= cmpVal << 8;
@ -139,16 +153,16 @@ const u8 * floodDetect(const struct FDR * fdr,
#ifndef FLOOD_32
j -= (u32)((uintptr_t)buf + j) & 0x7; // push j back to yield 8-aligned addrs
for (; j + 32 < mainLoopLen; j += 32) {
u64a v = *(const u64a *)(buf + j);
u64a v2 = *(const u64a *)(buf + j + 8);
u64a v3 = *(const u64a *)(buf + j + 16);
u64a v4 = *(const u64a *)(buf + j + 24);
u64a v = *CU64A_P_CAST(buf + j);
u64a v2 = *CU64A_P_CAST(buf + j + 8);
u64a v3 = *CU64A_P_CAST(buf + j + 16);
u64a v4 = *CU64A_P_CAST(buf + j + 24);
if ((v4 != cmpVal) || (v3 != cmpVal) || (v2 != cmpVal) || (v != cmpVal)) {
break;
}
}
for (; j + 8 < mainLoopLen; j += 8) {
u64a v = *(const u64a *)(buf + j);
u64a v = *CU64A_P_CAST(buf + j);
if (v != cmpVal) {
break;
}
@ -172,7 +186,11 @@ const u8 * floodDetect(const struct FDR * fdr,
}
#endif
for (; j < mainLoopLen; j++) {
#ifdef __cplusplus
u8 v = *(reinterpret_cast<const u8 *>(buf + j));
#else
u8 v = *(const u8 *)(buf + j);
#endif
if (v != c) {
break;
}

File diff suppressed because it is too large Load Diff

862
src/fdr/teddy.cpp Normal file
View File

@ -0,0 +1,862 @@
/*
* Copyright (c) 2015-2020, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/** \file
* \brief Teddy literal matcher: SSSE3 engine runtime.
*/
#include "fdr_internal.h"
#include "flood_runtime.h"
#include "teddy.h"
#include "teddy_internal.h"
#include "teddy_runtime_common.h"
#include "util/arch.h"
#include "util/simd_utils.h"
#ifdef ARCH_64_BIT
static really_inline
hwlm_error_t conf_chunk_64(u64a chunk, u8 bucket, u8 offset,
CautionReason reason, const u8 *pt,
const u32* confBase,
const struct FDR_Runtime_Args *a,
hwlm_group_t *control,
u32 *last_match) {
if (unlikely(chunk != ones_u64a)) {
chunk = ~chunk;
do_confWithBit_teddy(&chunk, bucket, offset, confBase, reason, a, pt,
control, last_match);
// adapted from CHECK_HWLM_TERMINATE_MATCHING
if (unlikely(*control == HWLM_TERMINATE_MATCHING)) {
return HWLM_TERMINATED;
}
}
return HWLM_SUCCESS;
}
#define CONF_CHUNK_64(chunk, bucket, off, reason, pt, confBase, a, control, last_match) \
if(conf_chunk_64(chunk, bucket, off, reason, pt, confBase, a, control, last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
#else // 32/64
static really_inline
hwlm_error_t conf_chunk_32(u32 chunk, u8 bucket, u8 offset,
CautionReason reason, const u8 *pt,
const u32* confBase,
const struct FDR_Runtime_Args *a,
hwlm_group_t *control,
u32 *last_match) {
if (unlikely(chunk != ones_u32)) {
chunk = ~chunk;
do_confWithBit_teddy(&chunk, bucket, offset, confBase, reason, a, pt,
control, last_match);
// adapted from CHECK_HWLM_TERMINATE_MATCHING
if (unlikely(*control == HWLM_TERMINATE_MATCHING)) {
return HWLM_TERMINATED;
}
}
return HWLM_SUCCESS;
}
#define CONF_CHUNK_32(chunk, bucket, off, reason, pt, confBase, a, control, last_match) \
if(conf_chunk_32(chunk, bucket, off, reason, pt, confBase, a, control, last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
#endif
#if defined(HAVE_AVX512VBMI) || defined(HAVE_AVX512) // common to both 512b's
static really_inline
const m512 *getDupMaskBase(const struct Teddy *teddy, u8 numMask) {
return (const m512 *)((const u8 *)teddy + ROUNDUP_CL(sizeof(struct Teddy))
+ ROUNDUP_CL(2 * numMask * sizeof(m256)));
}
#ifdef ARCH_64_BIT
static really_inline
hwlm_error_t confirm_teddy_64_512(m512 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff512(var, ones512()))) {
m128 p128_0 = extract128from512(var, 0);
m128 p128_1 = extract128from512(var, 1);
m128 p128_2 = extract128from512(var, 2);
m128 p128_3 = extract128from512(var, 3);
u64a part1 = movq(p128_0);
u64a part2 = movq(rshiftbyte_m128(p128_0, 8));
u64a part3 = movq(p128_1);
u64a part4 = movq(rshiftbyte_m128(p128_1, 8));
u64a part5 = movq(p128_2);
u64a part6 = movq(rshiftbyte_m128(p128_2, 8));
u64a part7 = movq(p128_3);
u64a part8 = movq(rshiftbyte_m128(p128_3, 8));
CONF_CHUNK_64(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part2, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part3, bucket, offset + 16, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part4, bucket, offset + 24, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part5, bucket, offset + 32, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part6, bucket, offset + 40, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part7, bucket, offset + 48, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part8, bucket, offset + 56, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_teddy_512_f confirm_teddy_64_512
#else // 32/64
static really_inline
hwlm_error_t confirm_teddy_32_512(m512 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff512(var, ones512()))) {
m128 p128_0 = extract128from512(var, 0);
m128 p128_1 = extract128from512(var, 1);
m128 p128_2 = extract128from512(var, 2);
m128 p128_3 = extract128from512(var, 3);
u32 part1 = movd(p128_0);
u32 part2 = movd(rshiftbyte_m128(p128_0, 4));
u32 part3 = movd(rshiftbyte_m128(p128_0, 8));
u32 part4 = movd(rshiftbyte_m128(p128_0, 12));
u32 part5 = movd(p128_1);
u32 part6 = movd(rshiftbyte_m128(p128_1, 4));
u32 part7 = movd(rshiftbyte_m128(p128_1, 8));
u32 part8 = movd(rshiftbyte_m128(p128_1, 12));
u32 part9 = movd(p128_2);
u32 part10 = movd(rshiftbyte_m128(p128_2, 4));
u32 part11 = movd(rshiftbyte_m128(p128_2, 8));
u32 part12 = movd(rshiftbyte_m128(p128_2, 12));
u32 part13 = movd(p128_3);
u32 part14 = movd(rshiftbyte_m128(p128_3, 4));
u32 part15 = movd(rshiftbyte_m128(p128_3, 8));
u32 part16 = movd(rshiftbyte_m128(p128_3, 12));
CONF_CHUNK_32(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part2, bucket, offset + 4, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part3, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part4, bucket, offset + 12, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part5, bucket, offset + 16, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part6, bucket, offset + 20, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part7, bucket, offset + 24, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part8, bucket, offset + 28, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part9, bucket, offset + 32, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part10, bucket, offset + 36, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part11, bucket, offset + 40, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part12, bucket, offset + 44, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part13, bucket, offset + 48, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part14, bucket, offset + 52, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part15, bucket, offset + 56, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part16, bucket, offset + 60, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_teddy_512_f confirm_teddy_32_512
#endif // 32/64
#define CONFIRM_TEDDY_512(...) if(confirm_teddy_512_f(__VA_ARGS__, a, confBase, &control, &last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
#endif // AVX512VBMI or AVX512
#if defined(HAVE_AVX512VBMI) // VBMI strong teddy
#define TEDDY_VBMI_SL1_MASK 0xfffffffffffffffeULL
#define TEDDY_VBMI_SL2_MASK 0xfffffffffffffffcULL
#define TEDDY_VBMI_SL3_MASK 0xfffffffffffffff8ULL
template<int NMSK>
static really_inline
m512 prep_conf_teddy_512vbmi_templ(const m512 *lo_mask, const m512 *dup_mask,
const m512 *sl_msk, const m512 val) {
m512 lo = and512(val, *lo_mask);
m512 hi = and512(rshift64_m512(val, 4), *lo_mask);
m512 shuf_or_b0 = or512(pshufb_m512(dup_mask[0], lo),
pshufb_m512(dup_mask[1], hi));
if constexpr (NMSK == 1) return shuf_or_b0;
m512 shuf_or_b1 = or512(pshufb_m512(dup_mask[2], lo),
pshufb_m512(dup_mask[3], hi));
m512 sl1 = maskz_vpermb512(TEDDY_VBMI_SL1_MASK, sl_msk[0], shuf_or_b1);
if constexpr (NMSK == 2) return (or512(sl1, shuf_or_b0));
m512 shuf_or_b2 = or512(pshufb_m512(dup_mask[4], lo),
pshufb_m512(dup_mask[5], hi));
m512 sl2 = maskz_vpermb512(TEDDY_VBMI_SL2_MASK, sl_msk[1], shuf_or_b2);
if constexpr (NMSK == 3) return (or512(sl2, or512(sl1, shuf_or_b0)));
m512 shuf_or_b3 = or512(pshufb_m512(dup_mask[6], lo),
pshufb_m512(dup_mask[7], hi));
m512 sl3 = maskz_vpermb512(TEDDY_VBMI_SL3_MASK, sl_msk[2], shuf_or_b3);
return (or512(sl3, or512(sl2, or512(sl1, shuf_or_b0))));
}
#define TEDDY_VBMI_SL1_POS 15
#define TEDDY_VBMI_SL2_POS 14
#define TEDDY_VBMI_SL3_POS 13
#define TEDDY_VBMI_CONF_MASK_HEAD (0xffffffffffffffffULL >> n_sh)
#define TEDDY_VBMI_CONF_MASK_FULL (0xffffffffffffffffULL << n_sh)
#define TEDDY_VBMI_CONF_MASK_VAR(n) (0xffffffffffffffffULL >> (64 - n) << overlap)
#define TEDDY_VBMI_LOAD_MASK_PATCH (0xffffffffffffffffULL >> (64 - n_sh))
template<int NMSK>
hwlm_error_t fdr_exec_teddy_512vbmi_templ(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
const u8 *buf_end = a->buf + a->len;
const u8 *ptr = a->buf + a->start_offset;
u32 floodBackoff = FLOOD_BACKOFF_START;
const u8 *tryFloodDetect = a->firstFloodDetect;
u32 last_match = ones_u32;
const struct Teddy *teddy = (const struct Teddy *)fdr;
const size_t iterBytes = 64;
u32 n_sh = NMSK - 1;
const size_t loopBytes = 64 - n_sh;
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n",
a->buf, a->len, a->start_offset);
const m128 *maskBase = getMaskBase(teddy);
m512 lo_mask = set1_64x8(0xf);
m512 dup_mask[NMSK * 2];
m512 sl_msk[NMSK - 1];
dup_mask[0] = set1_4x128(maskBase[0]);
dup_mask[1] = set1_4x128(maskBase[1]);
if constexpr (NMSK > 1){
dup_mask[2] = set1_4x128(maskBase[2]);
dup_mask[3] = set1_4x128(maskBase[3]);
sl_msk[0] = loadu512(p_sh_mask_arr + TEDDY_VBMI_SL1_POS);
}
if constexpr (NMSK > 2){
dup_mask[4] = set1_4x128(maskBase[4]);
dup_mask[5] = set1_4x128(maskBase[5]);
sl_msk[1] = loadu512(p_sh_mask_arr + TEDDY_VBMI_SL2_POS);
}
if constexpr (NMSK > 3){
dup_mask[6] = set1_4x128(maskBase[6]);
dup_mask[7] = set1_4x128(maskBase[7]);
sl_msk[2] = loadu512(p_sh_mask_arr + TEDDY_VBMI_SL3_POS);
}
const u32 *confBase = getConfBase(teddy);
u64a k = TEDDY_VBMI_CONF_MASK_FULL;
m512 p_mask = set_mask_m512(~k);
u32 overlap = 0;
u64a patch = 0;
if (likely(ptr + loopBytes <= buf_end)) {
m512 p_mask0 = set_mask_m512(~TEDDY_VBMI_CONF_MASK_HEAD);
m512 r_0 = prep_conf_teddy_512vbmi_templ<NMSK>(&lo_mask, dup_mask, sl_msk, loadu512(ptr));
r_0 = or512(r_0, p_mask0);
CONFIRM_TEDDY_512(r_0, 8, 0, VECTORING, ptr);
ptr += loopBytes;
overlap = n_sh;
patch = TEDDY_VBMI_LOAD_MASK_PATCH;
}
for (; ptr + loopBytes <= buf_end; ptr += loopBytes) {
__builtin_prefetch(ptr - n_sh + (64 * 2));
CHECK_FLOOD;
m512 r_0 = prep_conf_teddy_512vbmi_templ<NMSK>(&lo_mask, dup_mask, sl_msk, loadu512(ptr - n_sh));
r_0 = or512(r_0, p_mask);
CONFIRM_TEDDY_512(r_0, 8, 0, NOT_CAUTIOUS, ptr - n_sh);
}
assert(ptr + loopBytes > buf_end);
if (ptr < buf_end) {
u32 left = (u32)(buf_end - ptr);
u64a k1 = TEDDY_VBMI_CONF_MASK_VAR(left);
m512 p_mask1 = set_mask_m512(~k1);
m512 val_0 = loadu_maskz_m512(k1 | patch, ptr - overlap);
m512 r_0 = prep_conf_teddy_512vbmi_templ<NMSK>(&lo_mask, dup_mask, sl_msk, val_0);
r_0 = or512(r_0, p_mask1);
CONFIRM_TEDDY_512(r_0, 8, 0, VECTORING, ptr - overlap);
}
return HWLM_SUCCESS;
}
#define FDR_EXEC_TEDDY_FN fdr_exec_teddy_512vbmi_templ
#elif defined(HAVE_AVX512) // AVX512 reinforced teddy
/* both 512b versions use the same confirm teddy */
template <int NMSK>
static inline
m512 shift_or_512_templ(const m512 *dup_mask, m512 lo, m512 hi) {
return or512(lshift128_m512(or512(pshufb_m512(dup_mask[(NMSK - 1) * 2], lo),
pshufb_m512(dup_mask[(NMSK * 2) - 1], hi)),
NMSK - 1), shift_or_512_templ<NMSK - 1>(dup_mask, lo, hi));
}
template <>
m512 shift_or_512_templ<1>(const m512 *dup_mask, m512 lo, m512 hi){
return or512(pshufb_m512(dup_mask[0], lo), pshufb_m512(dup_mask[1], hi));
}
template <int NMSK>
static really_inline
m512 prep_conf_teddy_no_reinforcement_512_templ(const m512 *lo_mask,
const m512 *dup_mask,
const m512 val) {
m512 lo = and512(val, *lo_mask);
m512 hi = and512(rshift64_m512(val, 4), *lo_mask);
return shift_or_512_templ<NMSK>(dup_mask, lo, hi);
}
template <int NMSK>
static really_inline
m512 prep_conf_teddy_512_templ(const m512 *lo_mask, const m512 *dup_mask,
const u8 *ptr, const u64a *r_msk_base,
u32 *c_0, u32 *c_16, u32 *c_32, u32 *c_48) {
m512 lo = and512(load512(ptr), *lo_mask);
m512 hi = and512(rshift64_m512(load512(ptr), 4), *lo_mask);
*c_16 = *(ptr + 15);
*c_32 = *(ptr + 31);
*c_48 = *(ptr + 47);
m512 r_msk = set8x64(0ULL, r_msk_base[*c_48], 0ULL, r_msk_base[*c_32],
0ULL, r_msk_base[*c_16], 0ULL, r_msk_base[*c_0]);
*c_0 = *(ptr + 63);
return or512(shift_or_512_templ<NMSK>(dup_mask, lo, hi), r_msk);
}
#define PREP_CONF_FN_512(ptr, n) \
prep_conf_teddy_512_templ<n>(&lo_mask, dup_mask, ptr, r_msk_base, \
&c_0, &c_16, &c_32, &c_48)
template <int NMSK>
hwlm_error_t fdr_exec_teddy_512_templ(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
const u8 *buf_end = a->buf + a->len;
const u8 *ptr = a->buf + a->start_offset;
u32 floodBackoff = FLOOD_BACKOFF_START;
const u8 *tryFloodDetect = a->firstFloodDetect;
u32 last_match = ones_u32;
const struct Teddy *teddy = (const struct Teddy *)fdr;
const size_t iterBytes = 128;
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n",
a->buf, a->len, a->start_offset);
const m128 *maskBase = getMaskBase(teddy);
m512 lo_mask = set1_64x8(0xf);
m512 dup_mask[NMSK * 2];
dup_mask[0] = set1_4x128(maskBase[0]);
dup_mask[1] = set1_4x128(maskBase[1]);
if constexpr (NMSK > 1){
dup_mask[2] = set1_4x128(maskBase[2]);
dup_mask[3] = set1_4x128(maskBase[3]);
}
if constexpr (NMSK > 2){
dup_mask[4] = set1_4x128(maskBase[4]);
dup_mask[5] = set1_4x128(maskBase[5]);
}
if constexpr (NMSK > 3){
dup_mask[6] = set1_4x128(maskBase[6]);
dup_mask[7] = set1_4x128(maskBase[7]);
}
const u32 *confBase = getConfBase(teddy);
const u64a *r_msk_base = getReinforcedMaskBase(teddy, NMSK);
u32 c_0 = 0x100;
u32 c_16 = 0x100;
u32 c_32 = 0x100;
u32 c_48 = 0x100;
const u8 *mainStart = ROUNDUP_PTR(ptr, 64);
DEBUG_PRINTF("derive: ptr: %p mainstart %p\n", ptr, mainStart);
if (ptr < mainStart) {
ptr = mainStart - 64;
m512 p_mask;
m512 val_0 = vectoredLoad512(&p_mask, ptr, a->start_offset,
a->buf, buf_end,
a->buf_history, a->len_history, NMSK);
m512 r_0 = prep_conf_teddy_no_reinforcement_512_templ<NMSK>(&lo_mask, dup_mask, val_0);
r_0 = or512(r_0, p_mask);
CONFIRM_TEDDY_512(r_0, 8, 0, VECTORING, ptr);
ptr += 64;
}
if (ptr + 64 <= buf_end) {
m512 r_0 = PREP_CONF_FN_512(ptr, NMSK);
CONFIRM_TEDDY_512(r_0, 8, 0, VECTORING, ptr);
ptr += 64;
}
for (; ptr + iterBytes <= buf_end; ptr += iterBytes) {
__builtin_prefetch(ptr + (iterBytes * 4));
CHECK_FLOOD;
m512 r_0 = PREP_CONF_FN_512(ptr, NMSK);
CONFIRM_TEDDY_512(r_0, 8, 0, NOT_CAUTIOUS, ptr);
m512 r_1 = PREP_CONF_FN_512(ptr + 64, NMSK);
CONFIRM_TEDDY_512(r_1, 8, 64, NOT_CAUTIOUS, ptr);
}
if (ptr + 64 <= buf_end) {
m512 r_0 = PREP_CONF_FN_512(ptr, NMSK);
CONFIRM_TEDDY_512(r_0, 8, 0, NOT_CAUTIOUS, ptr);
ptr += 64;
}
assert(ptr + 64 > buf_end);
if (ptr < buf_end) {
m512 p_mask;
m512 val_0 = vectoredLoad512(&p_mask, ptr, 0, ptr, buf_end,
a->buf_history, a->len_history, NMSK);
m512 r_0 = prep_conf_teddy_no_reinforcement_512_templ<NMSK>(&lo_mask, dup_mask,val_0);
r_0 = or512(r_0, p_mask);
CONFIRM_TEDDY_512(r_0, 8, 0, VECTORING, ptr);
}
return HWLM_SUCCESS;
}
#define FDR_EXEC_TEDDY_FN fdr_exec_teddy_512_templ
/* #endif // AVX512 vs AVX512VBMI * back to the original fully exclusive logic */
#elif defined(HAVE_AVX2) // not HAVE_AVX512 but HAVE_AVX2 reinforced teddy
#ifdef ARCH_64_BIT
hwlm_error_t confirm_teddy_64_256(m256 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff256(var, ones256()))) {
m128 lo = movdq_lo(var);
m128 hi = movdq_hi(var);
u64a part1 = movq(lo);
u64a part2 = movq(rshiftbyte_m128(lo, 8));
u64a part3 = movq(hi);
u64a part4 = movq(rshiftbyte_m128(hi, 8));
CONF_CHUNK_64(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part2, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part3, bucket, offset + 16, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(part4, bucket, offset + 24, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_teddy_256_f confirm_teddy_64_256
#else
hwlm_error_t confirm_teddy_32_256(m256 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff256(var, ones256()))) {
m128 lo = movdq_lo(var);
m128 hi = movdq_hi(var);
u32 part1 = movd(lo);
u32 part2 = movd(rshiftbyte_m128(lo, 4));
u32 part3 = movd(rshiftbyte_m128(lo, 8));
u32 part4 = movd(rshiftbyte_m128(lo, 12));
u32 part5 = movd(hi);
u32 part6 = movd(rshiftbyte_m128(hi, 4));
u32 part7 = movd(rshiftbyte_m128(hi, 8));
u32 part8 = movd(rshiftbyte_m128(hi, 12));
CONF_CHUNK_32(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part2, bucket, offset + 4, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part3, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part4, bucket, offset + 12, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part5, bucket, offset + 16, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part6, bucket, offset + 20, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part7, bucket, offset + 24, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part8, bucket, offset + 28, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_teddy_256_f confirm_teddy_32_256
#endif
#define CONFIRM_TEDDY_256(...) if(confirm_teddy_256_f(__VA_ARGS__, a, confBase, &control, &last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
/*
static really_inline
m256 vectoredLoad2x128(m256 *p_mask, const u8 *ptr, const size_t start_offset,
const u8 *lo, const u8 *hi,
const u8 *buf_history, size_t len_history,
const u32 nMasks) {
m128 p_mask128;
m256 ret = set1_2x128(vectoredLoad128(&p_mask128, ptr, start_offset, lo, hi,
buf_history, len_history, nMasks));
*p_mask = set1_2x128(p_mask128);
return ret;
}
*/
template <int NMSK>
static inline
m256 shift_or_256_templ(const m256 *dup_mask, m256 lo, m256 hi){
return or256(lshift128_m256(or256(pshufb_m256(dup_mask[(NMSK-1)*2], lo),
pshufb_m256(dup_mask[(NMSK*2)-1], hi)),
(NMSK-1)), shift_or_256_templ<NMSK-1>(dup_mask, lo, hi));
}
template<>
m256 shift_or_256_templ<1>(const m256 *dup_mask, m256 lo, m256 hi){
return or256(pshufb_m256(dup_mask[0], lo), pshufb_m256(dup_mask[1], hi));
}
template <int NMSK>
static really_inline
m256 prep_conf_teddy_no_reinforcement_256_templ(const m256 *lo_mask,
const m256 *dup_mask,
const m256 val) {
m256 lo = and256(val, *lo_mask);
m256 hi = and256(rshift64_m256(val, 4), *lo_mask);
return shift_or_256_templ<NMSK>(dup_mask, lo, hi);
}
template <int NMSK>
static really_inline
m256 prep_conf_teddy_256_templ(const m256 *lo_mask, const m256 *dup_mask,
const u8 *ptr, const u64a *r_msk_base,
u32 *c_0, u32 *c_128) {
m256 lo = and256(load256(ptr), *lo_mask);
m256 hi = and256(rshift64_m256(load256(ptr), 4), *lo_mask);
*c_128 = *(ptr + 15);
m256 r_msk = set4x64(0ULL, r_msk_base[*c_128], 0ULL, r_msk_base[*c_0]);
*c_0 = *(ptr + 31);
return or256(shift_or_256_templ<NMSK>(dup_mask, lo, hi), r_msk);
}
#define PREP_CONF_FN_256_NO_REINFORCEMENT(val, n) \
prep_conf_teddy_no_reinforcement_256_templ<n>(&lo_mask, dup_mask, val)
#define PREP_CONF_FN_256(ptr, n) \
prep_conf_teddy_256_templ<n>(&lo_mask, dup_mask, ptr, r_msk_base, &c_0, &c_128)
template <int NMSK>
hwlm_error_t fdr_exec_teddy_256_templ(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
const u8 *buf_end = a->buf + a->len;
const u8 *ptr = a->buf + a->start_offset;
u32 floodBackoff = FLOOD_BACKOFF_START;
const u8 *tryFloodDetect = a->firstFloodDetect;
u32 last_match = ones_u32;
const struct Teddy *teddy = (const struct Teddy *)fdr;
const size_t iterBytes = 64;
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n",
a->buf, a->len, a->start_offset);
const m128 *maskBase = getMaskBase(teddy);
//PREPARE_MASKS_256;
m256 lo_mask = set1_32x8(0xf);
m256 dup_mask[NMSK * 2];
dup_mask[0] = set1_2x128(maskBase[0]);
dup_mask[1] = set1_2x128(maskBase[1]);
if constexpr (NMSK > 1){
dup_mask[2] = set1_2x128(maskBase[2]);
dup_mask[3] = set1_2x128(maskBase[3]);
}
if constexpr (NMSK > 2){
dup_mask[4] = set1_2x128(maskBase[4]);
dup_mask[5] = set1_2x128(maskBase[5]);
}
if constexpr (NMSK > 3){
dup_mask[6] = set1_2x128(maskBase[6]);
dup_mask[7] = set1_2x128(maskBase[7]);
}
const u32 *confBase = getConfBase(teddy);
const u64a *r_msk_base = getReinforcedMaskBase(teddy, NMSK);
u32 c_0 = 0x100;
u32 c_128 = 0x100;
const u8 *mainStart = ROUNDUP_PTR(ptr, 32);
DEBUG_PRINTF("derive: ptr: %p mainstart %p\n", ptr, mainStart);
if (ptr < mainStart) {
ptr = mainStart - 32;
m256 p_mask;
m256 val_0 = vectoredLoad256(&p_mask, ptr, a->start_offset,
a->buf, buf_end,
a->buf_history, a->len_history, NMSK);
m256 r_0 = PREP_CONF_FN_256_NO_REINFORCEMENT(val_0, NMSK);
r_0 = or256(r_0, p_mask);
CONFIRM_TEDDY_256(r_0, 8, 0, VECTORING, ptr);
ptr += 32;
}
if (ptr + 32 <= buf_end) {
m256 r_0 = PREP_CONF_FN_256(ptr, NMSK);
CONFIRM_TEDDY_256(r_0, 8, 0, VECTORING, ptr);
ptr += 32;
}
for (; ptr + iterBytes <= buf_end; ptr += iterBytes) {
__builtin_prefetch(ptr + (iterBytes * 4));
CHECK_FLOOD;
m256 r_0 = PREP_CONF_FN_256(ptr, NMSK);
CONFIRM_TEDDY_256(r_0, 8, 0, NOT_CAUTIOUS, ptr);
m256 r_1 = PREP_CONF_FN_256(ptr + 32, NMSK);
CONFIRM_TEDDY_256(r_1, 8, 32, NOT_CAUTIOUS, ptr);
}
if (ptr + 32 <= buf_end) {
m256 r_0 = PREP_CONF_FN_256(ptr, NMSK);
CONFIRM_TEDDY_256(r_0, 8, 0, NOT_CAUTIOUS, ptr);
ptr += 32;
}
assert(ptr + 32 > buf_end);
if (ptr < buf_end) {
m256 p_mask;
m256 val_0 = vectoredLoad256(&p_mask, ptr, 0, ptr, buf_end,
a->buf_history, a->len_history, NMSK);
m256 r_0 = PREP_CONF_FN_256_NO_REINFORCEMENT(val_0, NMSK);
r_0 = or256(r_0, p_mask);
CONFIRM_TEDDY_256(r_0, 8, 0, VECTORING, ptr);
}
return HWLM_SUCCESS;
}
#define FDR_EXEC_TEDDY_FN fdr_exec_teddy_256_templ
#else // not defined HAVE_AVX2
#ifdef ARCH_64_BIT
static really_inline
hwlm_error_t confirm_teddy_64_128(m128 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff128(var, ones128()))) {
u64a lo = 0;
u64a hi = 0;
u64a __attribute__((aligned(16))) vec[2];
store128(vec, var);
lo = vec[0];
hi = vec[1];
CONF_CHUNK_64(lo, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_64(hi, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_teddy_128_f confirm_teddy_64_128
#else // 32/64
static really_inline
hwlm_error_t confirm_teddy_32_128(m128 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff128(var, ones128()))) {
u32 part1 = movd(var);
u32 part2 = movd(rshiftbyte_m128(var, 4));
u32 part3 = movd(rshiftbyte_m128(var, 8));
u32 part4 = movd(rshiftbyte_m128(var, 12));
CONF_CHUNK_32(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part2, bucket, offset + 4, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part3, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_CHUNK_32(part4, bucket, offset + 12, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_teddy_128_f confirm_teddy_32_128
#endif // 32/64
#define CONFIRM_TEDDY_128(...) if(confirm_teddy_128_f(__VA_ARGS__, a, confBase, &control, &last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
template <int NMSK>
static really_inline
m128 prep_conf_teddy_128_templ(const m128 *maskBase, m128 val) {
m128 mask = set1_16x8(0xf);
m128 lo = and128(val, mask);
m128 hi = and128(rshift64_m128(val, 4), mask);
m128 r1 = or128(pshufb_m128(maskBase[0 * 2], lo),
pshufb_m128(maskBase[0 * 2 + 1], hi));
if constexpr (NMSK == 1) return r1;
m128 res_1 = or128(pshufb_m128(maskBase[1 * 2], lo),
pshufb_m128(maskBase[1 * 2 + 1], hi));
m128 old_1 = zeroes128();
m128 res_shifted_1 = palignr(res_1, old_1, 16 - 1);
m128 r2 = or128(r1, res_shifted_1);
if constexpr (NMSK == 2) return r2;
m128 res_2 = or128(pshufb_m128(maskBase[2 * 2], lo),
pshufb_m128(maskBase[2 * 2 + 1], hi));
m128 res_shifted_2 = palignr(res_2, old_1, 16 - 2);
m128 r3 = or128(r2, res_shifted_2);
if constexpr (NMSK == 3) return r3;
m128 res_3 = or128(pshufb_m128(maskBase[3 * 2], lo),
pshufb_m128(maskBase[3 * 2 + 1], hi));
m128 res_shifted_3 = palignr(res_3, old_1, 16 - 3);
return or128(r3, res_shifted_3);
}
template <int NMSK>
hwlm_error_t fdr_exec_teddy_128_templ(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
const u8 *buf_end = a->buf + a->len;
const u8 *ptr = a->buf + a->start_offset;
u32 floodBackoff = FLOOD_BACKOFF_START;
const u8 *tryFloodDetect = a->firstFloodDetect;
u32 last_match = ones_u32;
const struct Teddy *teddy = reinterpret_cast<const struct Teddy *>(fdr);
const size_t iterBytes = 32;
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n",
a->buf, a->len, a->start_offset);
const m128 *maskBase = getMaskBase(teddy);
const u32 *confBase = getConfBase(teddy);
const u8 *mainStart = ROUNDUP_PTR(ptr, 16);
DEBUG_PRINTF("derive: ptr: %p mainstart %p\n", ptr, mainStart);
if (ptr < mainStart) {
ptr = mainStart - 16;
m128 p_mask;
m128 val_0 = vectoredLoad128(&p_mask, ptr, a->start_offset,
a->buf, buf_end,
a->buf_history, a->len_history, NMSK);
m128 r_0 = prep_conf_teddy_128_templ<NMSK>(maskBase, val_0);
r_0 = or128(r_0, p_mask);
CONFIRM_TEDDY_128(r_0, 8, 0, VECTORING, ptr);
ptr += 16;
}
if (ptr + 16 <= buf_end) {
m128 r_0 = prep_conf_teddy_128_templ<NMSK>(maskBase, load128(ptr));
CONFIRM_TEDDY_128(r_0, 8, 0, VECTORING, ptr);
ptr += 16;
}
for (; ptr + iterBytes <= buf_end; ptr += iterBytes) {
__builtin_prefetch(ptr + (iterBytes * 4));
CHECK_FLOOD;
m128 r_0 = prep_conf_teddy_128_templ<NMSK>(maskBase, load128(ptr));
CONFIRM_TEDDY_128(r_0, 8, 0, NOT_CAUTIOUS, ptr);
m128 r_1 = prep_conf_teddy_128_templ<NMSK>(maskBase, load128(ptr + 16));
CONFIRM_TEDDY_128(r_1, 8, 16, NOT_CAUTIOUS, ptr);
}
if (ptr + 16 <= buf_end) {
m128 r_0 = prep_conf_teddy_128_templ<NMSK>(maskBase, load128(ptr));
CONFIRM_TEDDY_128(r_0, 8, 0, NOT_CAUTIOUS, ptr);
ptr += 16;
}
assert(ptr + 16 > buf_end);
if (ptr < buf_end) {
m128 p_mask;
m128 val_0 = vectoredLoad128(&p_mask, ptr, 0, ptr, buf_end,
a->buf_history, a->len_history, NMSK);
m128 r_0 = prep_conf_teddy_128_templ<NMSK>(maskBase, val_0);
r_0 = or128(r_0, p_mask);
CONFIRM_TEDDY_128(r_0, 8, 0, VECTORING, ptr);
}
return HWLM_SUCCESS;
}
#define FDR_EXEC_TEDDY_FN fdr_exec_teddy_128_templ
#endif // HAVE_AVX2 HAVE_AVX512
extern "C" {
hwlm_error_t fdr_exec_teddy_msks1(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<1>(fdr, a, control);
}
hwlm_error_t fdr_exec_teddy_msks1_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<1>(fdr, a, control);
}
hwlm_error_t fdr_exec_teddy_msks2(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<2>(fdr, a, control);
}
hwlm_error_t fdr_exec_teddy_msks2_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<2>(fdr, a, control);
}
hwlm_error_t fdr_exec_teddy_msks3(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<3>(fdr, a, control);
}
hwlm_error_t fdr_exec_teddy_msks3_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<3>(fdr, a, control);
}
hwlm_error_t fdr_exec_teddy_msks4(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<4>(fdr, a, control);
}
hwlm_error_t fdr_exec_teddy_msks4_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_TEDDY_FN<4>(fdr, a, control);
}
} // extern

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2016-2017, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -39,6 +40,10 @@
struct FDR; // forward declaration from fdr_internal.h
struct FDR_Runtime_Args;
#ifdef __cplusplus
extern "C" {
#endif
hwlm_error_t fdr_exec_teddy_msks1(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control);
@ -106,5 +111,8 @@ hwlm_error_t fdr_exec_fat_teddy_msks4_pck(const struct FDR *fdr,
hwlm_group_t control);
#endif /* HAVE_AVX2 */
#ifdef __cplusplus
}
#endif
#endif /* TEDDY_H_ */

View File

@ -1,712 +0,0 @@
/*
* Copyright (c) 2016-2020, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/** \file
* \brief Teddy literal matcher: AVX2 engine runtime.
*/
#include "fdr_internal.h"
#include "flood_runtime.h"
#include "teddy.h"
#include "teddy_internal.h"
#include "teddy_runtime_common.h"
#include "util/arch.h"
#include "util/simd_utils.h"
#if defined(HAVE_AVX2)
const u8 ALIGN_AVX_DIRECTIVE p_mask_arr256[33][64] = {
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff},
{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}
};
#define CONF_FAT_CHUNK_64(chunk, bucket, off, reason, conf_fn) \
do { \
if (unlikely(chunk != ones_u64a)) { \
chunk = ~chunk; \
conf_fn(&chunk, bucket, off, confBase, reason, a, ptr, \
&control, &last_match); \
CHECK_HWLM_TERMINATE_MATCHING; \
} \
} while(0)
#define CONF_FAT_CHUNK_32(chunk, bucket, off, reason, conf_fn) \
do { \
if (unlikely(chunk != ones_u32)) { \
chunk = ~chunk; \
conf_fn(&chunk, bucket, off, confBase, reason, a, ptr, \
&control, &last_match); \
CHECK_HWLM_TERMINATE_MATCHING; \
} \
} while(0)
static really_inline
const m256 *getMaskBase_fat(const struct Teddy *teddy) {
return (const m256 *)((const u8 *)teddy + ROUNDUP_CL(sizeof(struct Teddy)));
}
#if defined(HAVE_AVX512_REVERT) // revert to AVX2 Fat Teddy
static really_inline
const u64a *getReinforcedMaskBase_fat(const struct Teddy *teddy, u8 numMask) {
return (const u64a *)((const u8 *)getMaskBase_fat(teddy)
+ ROUNDUP_CL(2 * numMask * sizeof(m256)));
}
#ifdef ARCH_64_BIT
#define CONFIRM_FAT_TEDDY(var, bucket, offset, reason, conf_fn) \
do { \
if (unlikely(diff512(var, ones512()))) { \
m512 swap = swap256in512(var); \
m512 r = interleave512lo(var, swap); \
m128 r0 = extract128from512(r, 0); \
m128 r1 = extract128from512(r, 1); \
u64a part1 = movq(r0); \
u64a part2 = extract64from128(r0, 1); \
u64a part5 = movq(r1); \
u64a part6 = extract64from128(r1, 1); \
r = interleave512hi(var, swap); \
r0 = extract128from512(r, 0); \
r1 = extract128from512(r, 1); \
u64a part3 = movq(r0); \
u64a part4 = extract64from128(r0, 1); \
u64a part7 = movq(r1); \
u64a part8 = extract64from128(r1, 1); \
CONF_FAT_CHUNK_64(part1, bucket, offset, reason, conf_fn); \
CONF_FAT_CHUNK_64(part2, bucket, offset + 4, reason, conf_fn); \
CONF_FAT_CHUNK_64(part3, bucket, offset + 8, reason, conf_fn); \
CONF_FAT_CHUNK_64(part4, bucket, offset + 12, reason, conf_fn); \
CONF_FAT_CHUNK_64(part5, bucket, offset + 16, reason, conf_fn); \
CONF_FAT_CHUNK_64(part6, bucket, offset + 20, reason, conf_fn); \
CONF_FAT_CHUNK_64(part7, bucket, offset + 24, reason, conf_fn); \
CONF_FAT_CHUNK_64(part8, bucket, offset + 28, reason, conf_fn); \
} \
} while(0)
#else
#define CONFIRM_FAT_TEDDY(var, bucket, offset, reason, conf_fn) \
do { \
if (unlikely(diff512(var, ones512()))) { \
m512 swap = swap256in512(var); \
m512 r = interleave512lo(var, swap); \
m128 r0 = extract128from512(r, 0); \
m128 r1 = extract128from512(r, 1); \
u32 part1 = movd(r0); \
u32 part2 = extract32from128(r0, 1); \
u32 part3 = extract32from128(r0, 2); \
u32 part4 = extract32from128(r0, 3); \
u32 part9 = movd(r1); \
u32 part10 = extract32from128(r1, 1); \
u32 part11 = extract32from128(r1, 2); \
u32 part12 = extract32from128(r1, 3); \
r = interleave512hi(var, swap); \
r0 = extract128from512(r, 0); \
r1 = extract128from512(r, 1); \
u32 part5 = movd(r0); \
u32 part6 = extract32from128(r0, 1); \
u32 part7 = extract32from128(r0, 2); \
u32 part8 = extract32from128(r0, 3); \
u32 part13 = movd(r1); \
u32 part14 = extract32from128(r1, 1); \
u32 part15 = extract32from128(r1, 2); \
u32 part16 = extract32from128(r1, 3); \
CONF_FAT_CHUNK_32(part1, bucket, offset, reason, conf_fn); \
CONF_FAT_CHUNK_32(part2, bucket, offset + 2, reason, conf_fn); \
CONF_FAT_CHUNK_32(part3, bucket, offset + 4, reason, conf_fn); \
CONF_FAT_CHUNK_32(part4, bucket, offset + 6, reason, conf_fn); \
CONF_FAT_CHUNK_32(part5, bucket, offset + 8, reason, conf_fn); \
CONF_FAT_CHUNK_32(part6, bucket, offset + 10, reason, conf_fn); \
CONF_FAT_CHUNK_32(part7, bucket, offset + 12, reason, conf_fn); \
CONF_FAT_CHUNK_32(part8, bucket, offset + 14, reason, conf_fn); \
CONF_FAT_CHUNK_32(part9, bucket, offset + 16, reason, conf_fn); \
CONF_FAT_CHUNK_32(part10, bucket, offset + 18, reason, conf_fn); \
CONF_FAT_CHUNK_32(part11, bucket, offset + 20, reason, conf_fn); \
CONF_FAT_CHUNK_32(part12, bucket, offset + 22, reason, conf_fn); \
CONF_FAT_CHUNK_32(part13, bucket, offset + 24, reason, conf_fn); \
CONF_FAT_CHUNK_32(part14, bucket, offset + 26, reason, conf_fn); \
CONF_FAT_CHUNK_32(part15, bucket, offset + 28, reason, conf_fn); \
CONF_FAT_CHUNK_32(part16, bucket, offset + 30, reason, conf_fn); \
} \
} while(0)
#endif
static really_inline
m512 vectoredLoad2x256(m512 *p_mask, const u8 *ptr, const size_t start_offset,
const u8 *lo, const u8 *hi,
const u8 *buf_history, size_t len_history,
const u32 nMasks) {
m256 p_mask256;
m512 ret = set2x256(vectoredLoad256(&p_mask256, ptr, start_offset, lo, hi,
buf_history, len_history, nMasks));
*p_mask = set2x256(p_mask256);
return ret;
}
#define PREP_FAT_SHUF_MASK_NO_REINFORCEMENT(val) \
m512 lo = and512(val, *lo_mask); \
m512 hi = and512(rshift64_m512(val, 4), *lo_mask)
#define PREP_FAT_SHUF_MASK \
PREP_FAT_SHUF_MASK_NO_REINFORCEMENT(set2x256(load256(ptr))); \
*c_16 = *(ptr + 15); \
m512 r_msk = set512_64(0ULL, r_msk_base_hi[*c_16], \
0ULL, r_msk_base_hi[*c_0], \
0ULL, r_msk_base_lo[*c_16], \
0ULL, r_msk_base_lo[*c_0]); \
*c_0 = *(ptr + 31)
#define FAT_SHIFT_OR_M1 \
or512(pshufb_m512(dup_mask[0], lo), pshufb_m512(dup_mask[1], hi))
#define FAT_SHIFT_OR_M2 \
or512(lshift128_m512(or512(pshufb_m512(dup_mask[2], lo), \
pshufb_m512(dup_mask[3], hi)), \
1), FAT_SHIFT_OR_M1)
#define FAT_SHIFT_OR_M3 \
or512(lshift128_m512(or512(pshufb_m512(dup_mask[4], lo), \
pshufb_m512(dup_mask[5], hi)), \
2), FAT_SHIFT_OR_M2)
#define FAT_SHIFT_OR_M4 \
or512(lshift128_m512(or512(pshufb_m512(dup_mask[6], lo), \
pshufb_m512(dup_mask[7], hi)), \
3), FAT_SHIFT_OR_M3)
static really_inline
m512 prep_conf_fat_teddy_no_reinforcement_m1(const m512 *lo_mask,
const m512 *dup_mask,
const m512 val) {
PREP_FAT_SHUF_MASK_NO_REINFORCEMENT(val);
return FAT_SHIFT_OR_M1;
}
static really_inline
m512 prep_conf_fat_teddy_no_reinforcement_m2(const m512 *lo_mask,
const m512 *dup_mask,
const m512 val) {
PREP_FAT_SHUF_MASK_NO_REINFORCEMENT(val);
return FAT_SHIFT_OR_M2;
}
static really_inline
m512 prep_conf_fat_teddy_no_reinforcement_m3(const m512 *lo_mask,
const m512 *dup_mask,
const m512 val) {
PREP_FAT_SHUF_MASK_NO_REINFORCEMENT(val);
return FAT_SHIFT_OR_M3;
}
static really_inline
m512 prep_conf_fat_teddy_no_reinforcement_m4(const m512 *lo_mask,
const m512 *dup_mask,
const m512 val) {
PREP_FAT_SHUF_MASK_NO_REINFORCEMENT(val);
return FAT_SHIFT_OR_M4;
}
static really_inline
m512 prep_conf_fat_teddy_m1(const m512 *lo_mask, const m512 *dup_mask,
const u8 *ptr, const u64a *r_msk_base_lo,
const u64a *r_msk_base_hi, u32 *c_0, u32 *c_16) {
PREP_FAT_SHUF_MASK;
return or512(FAT_SHIFT_OR_M1, r_msk);
}
static really_inline
m512 prep_conf_fat_teddy_m2(const m512 *lo_mask, const m512 *dup_mask,
const u8 *ptr, const u64a *r_msk_base_lo,
const u64a *r_msk_base_hi, u32 *c_0, u32 *c_16) {
PREP_FAT_SHUF_MASK;
return or512(FAT_SHIFT_OR_M2, r_msk);
}
static really_inline
m512 prep_conf_fat_teddy_m3(const m512 *lo_mask, const m512 *dup_mask,
const u8 *ptr, const u64a *r_msk_base_lo,
const u64a *r_msk_base_hi, u32 *c_0, u32 *c_16) {
PREP_FAT_SHUF_MASK;
return or512(FAT_SHIFT_OR_M3, r_msk);
}
static really_inline
m512 prep_conf_fat_teddy_m4(const m512 *lo_mask, const m512 *dup_mask,
const u8 *ptr, const u64a *r_msk_base_lo,
const u64a *r_msk_base_hi, u32 *c_0, u32 *c_16) {
PREP_FAT_SHUF_MASK;
return or512(FAT_SHIFT_OR_M4, r_msk);
}
#define PREP_CONF_FAT_FN_NO_REINFORCEMENT(val, n) \
prep_conf_fat_teddy_no_reinforcement_m##n(&lo_mask, dup_mask, val)
#define PREP_CONF_FAT_FN(ptr, n) \
prep_conf_fat_teddy_m##n(&lo_mask, dup_mask, ptr, \
r_msk_base_lo, r_msk_base_hi, &c_0, &c_16)
/*
* In FAT teddy, it needs 2 bytes to represent result of each position,
* so each nibble's(for example, lo nibble of last byte) FAT teddy mask
* has 16x2 bytes:
* |----------------------------------|----------------------------------|
* 16bytes (bucket 0..7 in each byte) 16bytes (bucket 8..15 in each byte)
* A B
* at runtime FAT teddy reads 16 bytes once and duplicate them to 32 bytes:
* |----------------------------------|----------------------------------|
* 16bytes input data (lo nibbles) 16bytes duplicated data (lo nibbles)
* X X
* then do pshufb_m256(AB, XX).
*
* In AVX512 reinforced FAT teddy, it reads 32 bytes once and duplicate them
* to 64 bytes:
* |----------------|----------------|----------------|----------------|
* X Y X Y
* in this case we need DUP_FAT_MASK to construct AABB:
* |----------------|----------------|----------------|----------------|
* A A B B
* then do pshufb_m512(AABB, XYXY).
*/
#define DUP_FAT_MASK(a) mask_set2x256(set2x256(swap128in256(a)), 0xC3, a)
#define PREPARE_FAT_MASKS_1 \
dup_mask[0] = DUP_FAT_MASK(maskBase[0]); \
dup_mask[1] = DUP_FAT_MASK(maskBase[1]);
#define PREPARE_FAT_MASKS_2 \
PREPARE_FAT_MASKS_1 \
dup_mask[2] = DUP_FAT_MASK(maskBase[2]); \
dup_mask[3] = DUP_FAT_MASK(maskBase[3]);
#define PREPARE_FAT_MASKS_3 \
PREPARE_FAT_MASKS_2 \
dup_mask[4] = DUP_FAT_MASK(maskBase[4]); \
dup_mask[5] = DUP_FAT_MASK(maskBase[5]);
#define PREPARE_FAT_MASKS_4 \
PREPARE_FAT_MASKS_3 \
dup_mask[6] = DUP_FAT_MASK(maskBase[6]); \
dup_mask[7] = DUP_FAT_MASK(maskBase[7]);
#define PREPARE_FAT_MASKS(n) \
m512 lo_mask = set64x8(0xf); \
m512 dup_mask[n * 2]; \
PREPARE_FAT_MASKS_##n
#define FDR_EXEC_FAT_TEDDY(fdr, a, control, n_msk, conf_fn) \
do { \
const u8 *buf_end = a->buf + a->len; \
const u8 *ptr = a->buf + a->start_offset; \
u32 floodBackoff = FLOOD_BACKOFF_START; \
const u8 *tryFloodDetect = a->firstFloodDetect; \
u32 last_match = ones_u32; \
const struct Teddy *teddy = (const struct Teddy *)fdr; \
const size_t iterBytes = 64; \
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n", \
a->buf, a->len, a->start_offset); \
\
const m256 *maskBase = getMaskBase_fat(teddy); \
PREPARE_FAT_MASKS(n_msk); \
const u32 *confBase = getConfBase(teddy); \
\
const u64a *r_msk_base_lo = getReinforcedMaskBase_fat(teddy, n_msk); \
const u64a *r_msk_base_hi = r_msk_base_lo + (N_CHARS + 1); \
u32 c_0 = 0x100; \
u32 c_16 = 0x100; \
const u8 *mainStart = ROUNDUP_PTR(ptr, 32); \
DEBUG_PRINTF("derive: ptr: %p mainstart %p\n", ptr, mainStart); \
if (ptr < mainStart) { \
ptr = mainStart - 32; \
m512 p_mask; \
m512 val_0 = vectoredLoad2x256(&p_mask, ptr, a->start_offset, \
a->buf, buf_end, \
a->buf_history, a->len_history, n_msk); \
m512 r_0 = PREP_CONF_FAT_FN_NO_REINFORCEMENT(val_0, n_msk); \
r_0 = or512(r_0, p_mask); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, VECTORING, conf_fn); \
ptr += 32; \
} \
\
if (ptr + 32 <= buf_end) { \
m512 r_0 = PREP_CONF_FAT_FN(ptr, n_msk); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, VECTORING, conf_fn); \
ptr += 32; \
} \
\
for (; ptr + iterBytes <= buf_end; ptr += iterBytes) { \
__builtin_prefetch(ptr + (iterBytes * 4)); \
CHECK_FLOOD; \
m512 r_0 = PREP_CONF_FAT_FN(ptr, n_msk); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, NOT_CAUTIOUS, conf_fn); \
m512 r_1 = PREP_CONF_FAT_FN(ptr + 32, n_msk); \
CONFIRM_FAT_TEDDY(r_1, 16, 32, NOT_CAUTIOUS, conf_fn); \
} \
\
if (ptr + 32 <= buf_end) { \
m512 r_0 = PREP_CONF_FAT_FN(ptr, n_msk); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, NOT_CAUTIOUS, conf_fn); \
ptr += 32; \
} \
\
assert(ptr + 32 > buf_end); \
if (ptr < buf_end) { \
m512 p_mask; \
m512 val_0 = vectoredLoad2x256(&p_mask, ptr, 0, ptr, buf_end, \
a->buf_history, a->len_history, n_msk); \
m512 r_0 = PREP_CONF_FAT_FN_NO_REINFORCEMENT(val_0, n_msk); \
r_0 = or512(r_0, p_mask); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, VECTORING, conf_fn); \
} \
\
return HWLM_SUCCESS; \
} while(0)
#else // HAVE_AVX512
#ifdef ARCH_64_BIT
#define CONFIRM_FAT_TEDDY(var, bucket, offset, reason, conf_fn) \
do { \
if (unlikely(diff256(var, ones256()))) { \
m256 swap = swap128in256(var); \
m256 r = interleave256lo(var, swap); \
u64a part1 = extractlow64from256(r); \
u64a part2 = extract64from256(r, 1); \
r = interleave256hi(var, swap); \
u64a part3 = extractlow64from256(r); \
u64a part4 = extract64from256(r, 1); \
CONF_FAT_CHUNK_64(part1, bucket, offset, reason, conf_fn); \
CONF_FAT_CHUNK_64(part2, bucket, offset + 4, reason, conf_fn); \
CONF_FAT_CHUNK_64(part3, bucket, offset + 8, reason, conf_fn); \
CONF_FAT_CHUNK_64(part4, bucket, offset + 12, reason, conf_fn); \
} \
} while(0)
#else
#define CONFIRM_FAT_TEDDY(var, bucket, offset, reason, conf_fn) \
do { \
if (unlikely(diff256(var, ones256()))) { \
m256 swap = swap128in256(var); \
m256 r = interleave256lo(var, swap); \
u32 part1 = extractlow32from256(r); \
u32 part2 = extract32from256(r, 1); \
u32 part3 = extract32from256(r, 2); \
u32 part4 = extract32from256(r, 3); \
r = interleave256hi(var, swap); \
u32 part5 = extractlow32from256(r); \
u32 part6 = extract32from256(r, 1); \
u32 part7 = extract32from256(r, 2); \
u32 part8 = extract32from256(r, 3); \
CONF_FAT_CHUNK_32(part1, bucket, offset, reason, conf_fn); \
CONF_FAT_CHUNK_32(part2, bucket, offset + 2, reason, conf_fn); \
CONF_FAT_CHUNK_32(part3, bucket, offset + 4, reason, conf_fn); \
CONF_FAT_CHUNK_32(part4, bucket, offset + 6, reason, conf_fn); \
CONF_FAT_CHUNK_32(part5, bucket, offset + 8, reason, conf_fn); \
CONF_FAT_CHUNK_32(part6, bucket, offset + 10, reason, conf_fn); \
CONF_FAT_CHUNK_32(part7, bucket, offset + 12, reason, conf_fn); \
CONF_FAT_CHUNK_32(part8, bucket, offset + 14, reason, conf_fn); \
} \
} while(0)
#endif
static really_inline
m256 vectoredLoad2x128(m256 *p_mask, const u8 *ptr, const size_t start_offset,
const u8 *lo, const u8 *hi,
const u8 *buf_history, size_t len_history,
const u32 nMasks) {
m128 p_mask128;
m256 ret = set2x128(vectoredLoad128(&p_mask128, ptr, start_offset, lo, hi,
buf_history, len_history, nMasks));
*p_mask = set2x128(p_mask128);
return ret;
}
static really_inline
m256 prep_conf_fat_teddy_m1(const m256 *maskBase, m256 val) {
m256 mask = set32x8(0xf);
m256 lo = and256(val, mask);
m256 hi = and256(rshift64_m256(val, 4), mask);
return or256(pshufb_m256(maskBase[0 * 2], lo),
pshufb_m256(maskBase[0 * 2 + 1], hi));
}
static really_inline
m256 prep_conf_fat_teddy_m2(const m256 *maskBase, m256 *old_1, m256 val) {
m256 mask = set32x8(0xf);
m256 lo = and256(val, mask);
m256 hi = and256(rshift64_m256(val, 4), mask);
m256 r = prep_conf_fat_teddy_m1(maskBase, val);
m256 res_1 = or256(pshufb_m256(maskBase[1 * 2], lo),
pshufb_m256(maskBase[1 * 2 + 1], hi));
m256 res_shifted_1 = vpalignr(res_1, *old_1, 16 - 1);
*old_1 = res_1;
return or256(r, res_shifted_1);
}
static really_inline
m256 prep_conf_fat_teddy_m3(const m256 *maskBase, m256 *old_1, m256 *old_2,
m256 val) {
m256 mask = set32x8(0xf);
m256 lo = and256(val, mask);
m256 hi = and256(rshift64_m256(val, 4), mask);
m256 r = prep_conf_fat_teddy_m2(maskBase, old_1, val);
m256 res_2 = or256(pshufb_m256(maskBase[2 * 2], lo),
pshufb_m256(maskBase[2 * 2 + 1], hi));
m256 res_shifted_2 = vpalignr(res_2, *old_2, 16 - 2);
*old_2 = res_2;
return or256(r, res_shifted_2);
}
static really_inline
m256 prep_conf_fat_teddy_m4(const m256 *maskBase, m256 *old_1, m256 *old_2,
m256 *old_3, m256 val) {
m256 mask = set32x8(0xf);
m256 lo = and256(val, mask);
m256 hi = and256(rshift64_m256(val, 4), mask);
m256 r = prep_conf_fat_teddy_m3(maskBase, old_1, old_2, val);
m256 res_3 = or256(pshufb_m256(maskBase[3 * 2], lo),
pshufb_m256(maskBase[3 * 2 + 1], hi));
m256 res_shifted_3 = vpalignr(res_3, *old_3, 16 - 3);
*old_3 = res_3;
return or256(r, res_shifted_3);
}
#define FDR_EXEC_FAT_TEDDY_RES_OLD_1 \
do { \
} while(0)
#define FDR_EXEC_FAT_TEDDY_RES_OLD_2 \
m256 res_old_1 = zeroes256();
#define FDR_EXEC_FAT_TEDDY_RES_OLD_3 \
m256 res_old_1 = zeroes256(); \
m256 res_old_2 = zeroes256();
#define FDR_EXEC_FAT_TEDDY_RES_OLD_4 \
m256 res_old_1 = zeroes256(); \
m256 res_old_2 = zeroes256(); \
m256 res_old_3 = zeroes256();
#define FDR_EXEC_FAT_TEDDY_RES_OLD(n) FDR_EXEC_FAT_TEDDY_RES_OLD_##n
#define PREP_CONF_FAT_FN_1(mask_base, val) \
prep_conf_fat_teddy_m1(mask_base, val)
#define PREP_CONF_FAT_FN_2(mask_base, val) \
prep_conf_fat_teddy_m2(mask_base, &res_old_1, val)
#define PREP_CONF_FAT_FN_3(mask_base, val) \
prep_conf_fat_teddy_m3(mask_base, &res_old_1, &res_old_2, val)
#define PREP_CONF_FAT_FN_4(mask_base, val) \
prep_conf_fat_teddy_m4(mask_base, &res_old_1, &res_old_2, &res_old_3, val)
#define PREP_CONF_FAT_FN(mask_base, val, n) \
PREP_CONF_FAT_FN_##n(mask_base, val)
#define FDR_EXEC_FAT_TEDDY(fdr, a, control, n_msk, conf_fn) \
do { \
const u8 *buf_end = a->buf + a->len; \
const u8 *ptr = a->buf + a->start_offset; \
u32 floodBackoff = FLOOD_BACKOFF_START; \
const u8 *tryFloodDetect = a->firstFloodDetect; \
u32 last_match = ones_u32; \
const struct Teddy *teddy = (const struct Teddy *)fdr; \
const size_t iterBytes = 32; \
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n", \
a->buf, a->len, a->start_offset); \
\
const m256 *maskBase = getMaskBase_fat(teddy); \
const u32 *confBase = getConfBase(teddy); \
\
FDR_EXEC_FAT_TEDDY_RES_OLD(n_msk); \
const u8 *mainStart = ROUNDUP_PTR(ptr, 16); \
DEBUG_PRINTF("derive: ptr: %p mainstart %p\n", ptr, mainStart); \
if (ptr < mainStart) { \
ptr = mainStart - 16; \
m256 p_mask; \
m256 val_0 = vectoredLoad2x128(&p_mask, ptr, a->start_offset, \
a->buf, buf_end, \
a->buf_history, a->len_history, \
n_msk); \
m256 r_0 = PREP_CONF_FAT_FN(maskBase, val_0, n_msk); \
r_0 = or256(r_0, p_mask); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, VECTORING, conf_fn); \
ptr += 16; \
} \
\
if (ptr + 16 <= buf_end) { \
m256 r_0 = PREP_CONF_FAT_FN(maskBase, load2x128(ptr), n_msk); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, VECTORING, conf_fn); \
ptr += 16; \
} \
\
for ( ; ptr + iterBytes <= buf_end; ptr += iterBytes) { \
__builtin_prefetch(ptr + (iterBytes * 4)); \
CHECK_FLOOD; \
m256 r_0 = PREP_CONF_FAT_FN(maskBase, load2x128(ptr), n_msk); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, NOT_CAUTIOUS, conf_fn); \
m256 r_1 = PREP_CONF_FAT_FN(maskBase, load2x128(ptr + 16), n_msk); \
CONFIRM_FAT_TEDDY(r_1, 16, 16, NOT_CAUTIOUS, conf_fn); \
} \
\
if (ptr + 16 <= buf_end) { \
m256 r_0 = PREP_CONF_FAT_FN(maskBase, load2x128(ptr), n_msk); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, NOT_CAUTIOUS, conf_fn); \
ptr += 16; \
} \
\
assert(ptr + 16 > buf_end); \
if (ptr < buf_end) { \
m256 p_mask; \
m256 val_0 = vectoredLoad2x128(&p_mask, ptr, 0, ptr, buf_end, \
a->buf_history, a->len_history, \
n_msk); \
m256 r_0 = PREP_CONF_FAT_FN(maskBase, val_0, n_msk); \
r_0 = or256(r_0, p_mask); \
CONFIRM_FAT_TEDDY(r_0, 16, 0, VECTORING, conf_fn); \
} \
\
return HWLM_SUCCESS; \
} while(0)
#endif // HAVE_AVX512
hwlm_error_t fdr_exec_fat_teddy_msks1(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 1, do_confWithBit_teddy);
}
hwlm_error_t fdr_exec_fat_teddy_msks1_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 1, do_confWithBit_teddy);
}
hwlm_error_t fdr_exec_fat_teddy_msks2(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 2, do_confWithBit_teddy);
}
hwlm_error_t fdr_exec_fat_teddy_msks2_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 2, do_confWithBit_teddy);
}
hwlm_error_t fdr_exec_fat_teddy_msks3(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 3, do_confWithBit_teddy);
}
hwlm_error_t fdr_exec_fat_teddy_msks3_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 3, do_confWithBit_teddy);
}
hwlm_error_t fdr_exec_fat_teddy_msks4(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 4, do_confWithBit_teddy);
}
hwlm_error_t fdr_exec_fat_teddy_msks4_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
FDR_EXEC_FAT_TEDDY(fdr, a, control, 4, do_confWithBit_teddy);
}
#endif // HAVE_AVX2

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2015-2020, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -46,7 +46,6 @@
#include "util/alloc.h"
#include "util/compare.h"
#include "util/container.h"
#include "util/make_unique.h"
#include "util/noncopyable.h"
#include "util/popcount.h"
#include "util/small_vector.h"
@ -89,7 +88,7 @@ public:
const TeddyEngineDescription &eng_in, bool make_small_in,
const Grey &grey_in)
: eng(eng_in), grey(grey_in), lits(lits_in),
bucketToLits(move(bucketToLits_in)), make_small(make_small_in) {}
bucketToLits(std::move(bucketToLits_in)), make_small(make_small_in) {}
bytecode_ptr<FDR> build();
};
@ -166,7 +165,7 @@ public:
nibbleSets[i * 2] = nibbleSets[i * 2 + 1] = 0xffff;
}
}
litIds.push_back(lit_id);
litIds.emplace_back(lit_id);
sort_and_unique(litIds);
}
@ -329,7 +328,7 @@ bool pack(const vector<hwlmLiteral> &lits,
static
void initReinforcedTable(u8 *rmsk) {
u64a *mask = (u64a *)rmsk;
u64a *mask = reinterpret_cast<u64a *>(rmsk);
fill_n(mask, N_CHARS, 0x00ffffffffffffffULL);
}
@ -353,6 +352,89 @@ void fillReinforcedMsk(u8 *rmsk, u16 c, u32 j, u8 bmsk) {
}
}
static
void fillDupNibbleMasks(const map<BucketIndex,
vector<LiteralIndex>> &bucketToLits,
const vector<hwlmLiteral> &lits,
u32 numMasks, size_t maskLen,
u8 *baseMsk) {
u32 maskWidth = 2;
memset(baseMsk, 0xff, maskLen);
for (const auto &b2l : bucketToLits) {
const u32 &bucket_id = b2l.first;
const vector<LiteralIndex> &ids = b2l.second;
const u8 bmsk = 1U << (bucket_id % 8);
for (const LiteralIndex &lit_id : ids) {
const hwlmLiteral &l = lits[lit_id];
DEBUG_PRINTF("putting lit %u into bucket %u\n", lit_id, bucket_id);
const u32 sz = verify_u32(l.s.size());
// fill in masks
for (u32 j = 0; j < numMasks; j++) {
const u32 msk_id_lo = j * 2 * maskWidth + (bucket_id / 8);
const u32 msk_id_hi = (j * 2 + 1) * maskWidth + (bucket_id / 8);
const u32 lo_base0 = msk_id_lo * 32;
const u32 lo_base1 = msk_id_lo * 32 + 16;
const u32 hi_base0 = msk_id_hi * 32;
const u32 hi_base1 = msk_id_hi * 32 + 16;
// if we don't have a char at this position, fill in i
// locations in these masks with '1'
if (j >= sz) {
for (u32 n = 0; n < 16; n++) {
baseMsk[lo_base0 + n] &= ~bmsk;
baseMsk[lo_base1 + n] &= ~bmsk;
baseMsk[hi_base0 + n] &= ~bmsk;
baseMsk[hi_base1 + n] &= ~bmsk;
}
} else {
u8 c = l.s[sz - 1 - j];
// if we do have a char at this position
const u32 hiShift = 4;
u32 n_hi = (c >> hiShift) & 0xf;
u32 n_lo = c & 0xf;
if (j < l.msk.size() && l.msk[l.msk.size() - 1 - j]) {
u8 m = l.msk[l.msk.size() - 1 - j];
u8 m_hi = (m >> hiShift) & 0xf;
u8 m_lo = m & 0xf;
u8 cmp = l.cmp[l.msk.size() - 1 - j];
u8 cmp_lo = cmp & 0xf;
u8 cmp_hi = (cmp >> hiShift) & 0xf;
for (u8 cm = 0; cm < 0x10; cm++) {
if ((cm & m_lo) == (cmp_lo & m_lo)) {
baseMsk[lo_base0 + cm] &= ~bmsk;
baseMsk[lo_base1 + cm] &= ~bmsk;
}
if ((cm & m_hi) == (cmp_hi & m_hi)) {
baseMsk[hi_base0 + cm] &= ~bmsk;
baseMsk[hi_base1 + cm] &= ~bmsk;
}
}
} else {
if (l.nocase && ourisalpha(c)) {
u32 cmHalfClear = (0xdf >> hiShift) & 0xf;
u32 cmHalfSet = (0x20 >> hiShift) & 0xf;
baseMsk[hi_base0 + (n_hi & cmHalfClear)] &= ~bmsk;
baseMsk[hi_base1 + (n_hi & cmHalfClear)] &= ~bmsk;
baseMsk[hi_base0 + (n_hi | cmHalfSet)] &= ~bmsk;
baseMsk[hi_base1 + (n_hi | cmHalfSet)] &= ~bmsk;
} else {
baseMsk[hi_base0 + n_hi] &= ~bmsk;
baseMsk[hi_base1 + n_hi] &= ~bmsk;
}
baseMsk[lo_base0 + n_lo] &= ~bmsk;
baseMsk[lo_base1 + n_lo] &= ~bmsk;
}
}
}
}
}
}
static
void fillNibbleMasks(const map<BucketIndex,
vector<LiteralIndex>> &bucketToLits,
@ -432,7 +514,7 @@ void fillReinforcedTable(const map<BucketIndex,
u8 *rtable_base, const u32 num_tables) {
vector<u8 *> tables;
for (u32 i = 0; i < num_tables; i++) {
tables.push_back(rtable_base + i * RTABLE_SIZE);
tables.emplace_back(rtable_base + i * RTABLE_SIZE);
}
for (auto t : tables) {
@ -479,20 +561,23 @@ bytecode_ptr<FDR> TeddyCompiler::build() {
size_t headerSize = sizeof(Teddy);
size_t maskLen = eng.numMasks * 16 * 2 * maskWidth;
size_t reinforcedMaskLen = RTABLE_SIZE * maskWidth;
size_t reinforcedDupMaskLen = RTABLE_SIZE * maskWidth;
if (maskWidth == 2) { // dup nibble mask table in Fat Teddy
reinforcedDupMaskLen = maskLen * 2;
}
auto floodTable = setupFDRFloodControl(lits, eng, grey);
auto confirmTable = setupFullConfs(lits, eng, bucketToLits, make_small);
// Note: we place each major structure here on a cacheline boundary.
size_t size = ROUNDUP_CL(headerSize) + ROUNDUP_CL(maskLen) +
ROUNDUP_CL(reinforcedMaskLen) +
ROUNDUP_CL(reinforcedDupMaskLen) +
ROUNDUP_CL(confirmTable.size()) + floodTable.size();
auto fdr = make_zeroed_bytecode_ptr<FDR>(size, 64);
assert(fdr); // otherwise would have thrown std::bad_alloc
Teddy *teddy = (Teddy *)fdr.get(); // ugly
u8 *teddy_base = (u8 *)teddy;
Teddy *teddy = reinterpret_cast<Teddy *>(fdr.get()); // ugly
u8 *teddy_base = reinterpret_cast<u8 *>(teddy);
// Write header.
teddy->size = size;
@ -502,7 +587,7 @@ bytecode_ptr<FDR> TeddyCompiler::build() {
// Write confirm structures.
u8 *ptr = teddy_base + ROUNDUP_CL(headerSize) + ROUNDUP_CL(maskLen) +
ROUNDUP_CL(reinforcedMaskLen);
ROUNDUP_CL(reinforcedDupMaskLen);
assert(ISALIGNED_CL(ptr));
teddy->confOffset = verify_u32(ptr - teddy_base);
memcpy(ptr, confirmTable.get(), confirmTable.size());
@ -512,16 +597,23 @@ bytecode_ptr<FDR> TeddyCompiler::build() {
assert(ISALIGNED_CL(ptr));
teddy->floodOffset = verify_u32(ptr - teddy_base);
memcpy(ptr, floodTable.get(), floodTable.size());
ptr += floodTable.size();
// Write teddy masks.
u8 *baseMsk = teddy_base + ROUNDUP_CL(headerSize);
fillNibbleMasks(bucketToLits, lits, eng.numMasks, maskWidth, maskLen,
baseMsk);
// Write reinforcement masks.
u8 *reinforcedMsk = baseMsk + ROUNDUP_CL(maskLen);
fillReinforcedTable(bucketToLits, lits, reinforcedMsk, maskWidth);
if (maskWidth == 1) { // reinforcement table in Teddy
// Write reinforcement masks.
u8 *reinforcedMsk = baseMsk + ROUNDUP_CL(maskLen);
fillReinforcedTable(bucketToLits, lits, reinforcedMsk, maskWidth);
} else { // dup nibble mask table in Fat Teddy
assert(maskWidth == 2);
u8 *dupMsk = baseMsk + ROUNDUP_CL(maskLen);
fillDupNibbleMasks(bucketToLits, lits, eng.numMasks,
reinforcedDupMaskLen, dupMsk);
}
return fdr;
}
@ -530,7 +622,7 @@ bytecode_ptr<FDR> TeddyCompiler::build() {
static
bool assignStringsToBuckets(
const vector<hwlmLiteral> &lits,
TeddyEngineDescription &eng,
const TeddyEngineDescription &eng,
map<BucketIndex, vector<LiteralIndex>> &bucketToLits) {
assert(eng.numMasks <= MAX_NUM_MASKS);
if (lits.size() > eng.getNumBuckets() * TEDDY_BUCKET_LOAD) {
@ -584,7 +676,7 @@ unique_ptr<HWLMProto> teddyBuildProtoHinted(
return nullptr;
}
return ue2::make_unique<HWLMProto>(engType, move(des), lits,
return std::make_unique<HWLMProto>(engType, std::move(des), lits,
bucketToLits, make_small);
}

View File

@ -34,7 +34,6 @@
#include "fdr_engine_description.h"
#include "teddy_internal.h"
#include "teddy_engine_description.h"
#include "util/make_unique.h"
#include <cmath>
@ -53,14 +52,14 @@ u32 TeddyEngineDescription::getDefaultFloodSuffixLength() const {
void getTeddyDescriptions(vector<TeddyEngineDescription> *out) {
static const TeddyEngineDef defns[] = {
{ 3, 0 | HS_CPU_FEATURES_AVX2, 1, 16, false },
{ 4, 0 | HS_CPU_FEATURES_AVX2, 1, 16, true },
{ 5, 0 | HS_CPU_FEATURES_AVX2, 2, 16, false },
{ 6, 0 | HS_CPU_FEATURES_AVX2, 2, 16, true },
{ 7, 0 | HS_CPU_FEATURES_AVX2, 3, 16, false },
{ 8, 0 | HS_CPU_FEATURES_AVX2, 3, 16, true },
{ 9, 0 | HS_CPU_FEATURES_AVX2, 4, 16, false },
{ 10, 0 | HS_CPU_FEATURES_AVX2, 4, 16, true },
{ 3, HS_CPU_FEATURES_AVX2, 1, 16, false },
{ 4, HS_CPU_FEATURES_AVX2, 1, 16, true },
{ 5, HS_CPU_FEATURES_AVX2, 2, 16, false },
{ 6, HS_CPU_FEATURES_AVX2, 2, 16, true },
{ 7, HS_CPU_FEATURES_AVX2, 3, 16, false },
{ 8, HS_CPU_FEATURES_AVX2, 3, 16, true },
{ 9, HS_CPU_FEATURES_AVX2, 4, 16, false },
{ 10, HS_CPU_FEATURES_AVX2, 4, 16, true },
{ 11, 0, 1, 8, false },
{ 12, 0, 1, 8, true },
{ 13, 0, 2, 8, false },
@ -72,6 +71,7 @@ void getTeddyDescriptions(vector<TeddyEngineDescription> *out) {
};
out->clear();
for (const auto &def : defns) {
// cppcheck-suppress useStlAlgorithm
out->emplace_back(def);
}
}
@ -124,6 +124,7 @@ bool isAllowed(const vector<hwlmLiteral> &vl, const TeddyEngineDescription &eng,
u32 n_small_lits = 0;
for (const auto &lit : vl) {
if (lit.s.length() < eng.numMasks) {
// cppcheck-suppress useStlAlgorithm
n_small_lits++;
}
}
@ -197,7 +198,7 @@ chooseTeddyEngine(const target_t &target, const vector<hwlmLiteral> &vl) {
}
DEBUG_PRINTF("using engine %u\n", best->getID());
return ue2::make_unique<TeddyEngineDescription>(*best);
return std::make_unique<TeddyEngineDescription>(*best);
}
unique_ptr<TeddyEngineDescription> getTeddyDescription(u32 engineID) {
@ -205,8 +206,9 @@ unique_ptr<TeddyEngineDescription> getTeddyDescription(u32 engineID) {
getTeddyDescriptions(&descs);
for (const auto &desc : descs) {
// cppcheck-suppress useStlAlgorithm
if (desc.getID() == engineID) {
return ue2::make_unique<TeddyEngineDescription>(desc);
return std::make_unique<TeddyEngineDescription>(desc);
}
}

View File

@ -39,7 +39,7 @@ namespace ue2 {
#define TEDDY_BUCKET_LOAD 6
struct TeddyEngineDef {
struct TeddyEngineDef { //NOLINT (clang-analyzer-optin.performance.Padding)
u32 id;
u64a cpu_features;
u32 numMasks;

570
src/fdr/teddy_fat.cpp Normal file
View File

@ -0,0 +1,570 @@
/*
* Copyright (c) 2015-2020, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/* fat teddy for AVX2 and AVX512VBMI */
#include "fdr_internal.h"
#include "flood_runtime.h"
#include "teddy.h"
#include "teddy_internal.h"
#include "teddy_runtime_common.h"
#include "util/arch.h"
#include "util/simd_utils.h"
#if defined(HAVE_AVX2)
#ifdef ARCH_64_BIT
static really_inline
hwlm_error_t conf_chunk_64(u64a chunk, u8 bucket, u8 offset,
CautionReason reason, const u8 *pt,
const u32* confBase,
const struct FDR_Runtime_Args *a,
hwlm_group_t *control,
u32 *last_match) {
if (unlikely(chunk != ones_u64a)) {
chunk = ~chunk;
do_confWithBit_teddy(&chunk, bucket, offset, confBase, reason, a, pt,
control, last_match);
// adapted from CHECK_HWLM_TERMINATE_MATCHING
if (unlikely(*control == HWLM_TERMINATE_MATCHING)) {
return HWLM_TERMINATED;
}
}
return HWLM_SUCCESS;
}
#define CONF_FAT_CHUNK_64(chunk, bucket, off, reason, pt, confBase, a, control, last_match) \
if(conf_chunk_64(chunk, bucket, off, reason, pt, confBase, a, control, last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
#else
static really_inline
hwlm_error_t conf_chunk_32(u32 chunk, u8 bucket, u8 offset,
CautionReason reason, const u8 *pt,
const u32* confBase,
const struct FDR_Runtime_Args *a,
hwlm_group_t *control,
u32 *last_match) {
if (unlikely(chunk != ones_u32)) {
chunk = ~chunk;
do_confWithBit_teddy(&chunk, bucket, offset, confBase, reason, a, pt,
control, last_match);
// adapted from CHECK_HWLM_TERMINATE_MATCHING
if (unlikely(*control == HWLM_TERMINATE_MATCHING)) {
return HWLM_TERMINATED;
}
}
return HWLM_SUCCESS;
}
#define CONF_FAT_CHUNK_32(chunk, bucket, off, reason, pt, confBase, a, control, last_match) \
if(conf_chunk_32(chunk, bucket, off, reason, pt, confBase, a, control, last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
#endif
#if defined(HAVE_AVX512VBMI) // VBMI strong teddy
// fat 512 teddy is only with vbmi
static really_inline
const m512 *getDupMaskBase(const struct Teddy *teddy, u8 numMask) {
return (const m512 *)((const u8 *)teddy + ROUNDUP_CL(sizeof(struct Teddy))
+ ROUNDUP_CL(2 * numMask * sizeof(m256)));
}
const u8 ALIGN_CL_DIRECTIVE p_mask_interleave[64] = {
0, 32, 1, 33, 2, 34, 3, 35, 4, 36, 5, 37, 6, 38, 7, 39,
8, 40, 9, 41, 10, 42, 11, 43, 12, 44, 13, 45, 14, 46, 15, 47,
16, 48, 17, 49, 18, 50, 19, 51, 20, 52, 21, 53, 22, 54, 23, 55,
24, 56, 25, 57, 26, 58, 27, 59, 28, 60, 29, 61, 30, 62, 31, 63
};
#ifdef ARCH_64_BIT
hwlm_error_t confirm_fat_teddy_64_512(m512 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff512(var, ones512()))) {
m512 msk_interleave = load512(p_mask_interleave);
m512 r = vpermb512(msk_interleave, var);
m128 r0 = extract128from512(r, 0);
m128 r1 = extract128from512(r, 1);
m128 r2 = extract128from512(r, 2);
m128 r3 = extract128from512(r, 3);
u64a part1 = movq(r0);
u64a part2 = extract64from128(r0, 1);
u64a part3 = movq(r1);
u64a part4 = extract64from128(r1, 1);
u64a part5 = movq(r2);
u64a part6 = extract64from128(r2, 1);
u64a part7 = movq(r3);
u64a part8 = extract64from128(r3, 1);
CONF_FAT_CHUNK_64(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part2, bucket, offset + 4, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part3, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part4, bucket, offset + 12, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part5, bucket, offset + 16, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part6, bucket, offset + 20, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part7, bucket, offset + 24, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part8, bucket, offset + 28, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_fat_teddy_512_f confirm_fat_teddy_64_512
#else // 32-64
hwlm_error_t confirm_fat_teddy_32_512(m512 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff512(var, ones512()))) {
m512 msk_interleave = load512(p_mask_interleave);
m512 r = vpermb512(msk_interleave, var);
m128 r0 = extract128from512(r, 0);
m128 r1 = extract128from512(r, 1);
m128 r2 = extract128from512(r, 2);
m128 r3 = extract128from512(r, 3);
u32 part1 = movd(r0);
u32 part2 = extract32from128(r0, 1);
u32 part3 = extract32from128(r0, 2);
u32 part4 = extract32from128(r0, 3);
u32 part5 = movd(r1);
u32 part6 = extract32from128(r1, 1);
u32 part7 = extract32from128(r1, 2);
u32 part8 = extract32from128(r1, 3);
u32 part9 = movd(r2);
u32 part10 = extract32from128(r2, 1);
u32 part11 = extract32from128(r2, 2);
u32 part12 = extract32from128(r2, 3);
u32 part13 = movd(r3);
u32 part14 = extract32from128(r3, 1);
u32 part15 = extract32from128(r3, 2);
u32 part16 = extract32from128(r3, 3);
CONF_FAT_CHUNK_32(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part2, bucket, offset + 2, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part3, bucket, offset + 4, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part4, bucket, offset + 6, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part5, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part6, bucket, offset + 10, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part7, bucket, offset + 12, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part8, bucket, offset + 14, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part9, bucket, offset + 16, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part10, bucket, offset + 18, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part11, bucket, offset + 20, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part12, bucket, offset + 22, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part13, bucket, offset + 24, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part14, bucket, offset + 26, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part15, bucket, offset + 28, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part16, bucket, offset + 30, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
#define confirm_fat_teddy_512_f confirm_fat_teddy_32_512
#endif // 32/64
#define CONFIRM_FAT_TEDDY_512(...) if(confirm_fat_teddy_512_f(__VA_ARGS__, a, confBase, &control, &last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
#define TEDDY_VBMI_SL1_MASK 0xfffffffffffffffeULL
#define TEDDY_VBMI_SL2_MASK 0xfffffffffffffffcULL
#define TEDDY_VBMI_SL3_MASK 0xfffffffffffffff8ULL
#define FAT_TEDDY_VBMI_SL1_MASK 0xfffffffefffffffeULL
#define FAT_TEDDY_VBMI_SL2_MASK 0xfffffffcfffffffcULL
#define FAT_TEDDY_VBMI_SL3_MASK 0xfffffff8fffffff8ULL
#define FAT_TEDDY_VBMI_SL1_POS 15
#define FAT_TEDDY_VBMI_SL2_POS 14
#define FAT_TEDDY_VBMI_SL3_POS 13
#define FAT_TEDDY_VBMI_CONF_MASK_HEAD (0xffffffffULL >> n_sh)
#define FAT_TEDDY_VBMI_CONF_MASK_FULL ((0xffffffffULL << n_sh) & 0xffffffffULL)
#define FAT_TEDDY_VBMI_CONF_MASK_VAR(n) (0xffffffffULL >> (32 - n) << overlap)
#define FAT_TEDDY_VBMI_LOAD_MASK_PATCH (0xffffffffULL >> (32 - n_sh))
template<int NMSK>
static really_inline
m512 prep_conf_fat_teddy_512vbmi_templ(const m512 *lo_mask, const m512 *dup_mask,
const m512 *sl_msk, const m512 val) {
m512 lo = and512(val, *lo_mask);
m512 hi = and512(rshift64_m512(val, 4), *lo_mask);
m512 shuf_or_b0 = or512(pshufb_m512(dup_mask[0], lo),
pshufb_m512(dup_mask[1], hi));
if constexpr (NMSK == 1) return shuf_or_b0;
m512 shuf_or_b1 = or512(pshufb_m512(dup_mask[2], lo),
pshufb_m512(dup_mask[3], hi));
m512 sl1 = maskz_vpermb512(FAT_TEDDY_VBMI_SL1_MASK, sl_msk[0], shuf_or_b1);
if constexpr (NMSK == 2) return (or512(sl1, shuf_or_b0));
m512 shuf_or_b2 = or512(pshufb_m512(dup_mask[4], lo),
pshufb_m512(dup_mask[5], hi));
m512 sl2 = maskz_vpermb512(FAT_TEDDY_VBMI_SL2_MASK, sl_msk[1], shuf_or_b2);
if constexpr (NMSK == 3) return (or512(sl2, or512(sl1, shuf_or_b0)));
m512 shuf_or_b3 = or512(pshufb_m512(dup_mask[6], lo),
pshufb_m512(dup_mask[7], hi));
m512 sl3 = maskz_vpermb512(FAT_TEDDY_VBMI_SL3_MASK, sl_msk[2], shuf_or_b3);
return (or512(sl3, or512(sl2, or512(sl1, shuf_or_b0))));
}
#define TEDDY_VBMI_SL1_POS 15
#define TEDDY_VBMI_SL2_POS 14
#define TEDDY_VBMI_SL3_POS 13
#define TEDDY_VBMI_CONF_MASK_HEAD (0xffffffffffffffffULL >> n_sh)
#define TEDDY_VBMI_CONF_MASK_FULL (0xffffffffffffffffULL << n_sh)
#define TEDDY_VBMI_CONF_MASK_VAR(n) (0xffffffffffffffffULL >> (64 - n) << overlap)
#define TEDDY_VBMI_LOAD_MASK_PATCH (0xffffffffffffffffULL >> (64 - n_sh))
template<int NMSK>
hwlm_error_t fdr_exec_fat_teddy_512vbmi_templ(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
const u8 *buf_end = a->buf + a->len;
const u8 *ptr = a->buf + a->start_offset;
u32 floodBackoff = FLOOD_BACKOFF_START;
const u8 *tryFloodDetect = a->firstFloodDetect;
u32 last_match = ones_u32;
const struct Teddy *teddy = (const struct Teddy *)fdr;
const size_t iterBytes = 32;
u32 n_sh = NMSK - 1;
const size_t loopBytes = 32 - n_sh;
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n",
a->buf, a->len, a->start_offset);
const m512 *dup_mask = getDupMaskBase(teddy, NMSK);
m512 lo_mask = set1_64x8(0xf);
m512 sl_msk[NMSK - 1];
if constexpr (NMSK > 1){
sl_msk[0] = loadu512(p_sh_mask_arr + FAT_TEDDY_VBMI_SL1_POS);
}
if constexpr (NMSK > 2){
sl_msk[1] = loadu512(p_sh_mask_arr + FAT_TEDDY_VBMI_SL2_POS);
}
if constexpr (NMSK > 3){
sl_msk[2] = loadu512(p_sh_mask_arr + FAT_TEDDY_VBMI_SL3_POS);
}
const u32 *confBase = getConfBase(teddy);
u64a k = FAT_TEDDY_VBMI_CONF_MASK_FULL;
m512 p_mask = set_mask_m512(~((k << 32) | k));
u32 overlap = 0;
u64a patch = 0;
if (likely(ptr + loopBytes <= buf_end)) {
u64a k0 = FAT_TEDDY_VBMI_CONF_MASK_HEAD;
m512 p_mask0 = set_mask_m512(~((k0 << 32) | k0));
m512 r_0 = prep_conf_fat_teddy_512vbmi_templ<NMSK>(&lo_mask, dup_mask, sl_msk, set2x256(loadu_maskz_m256(k0, ptr)));
r_0 = or512(r_0, p_mask0);
CONFIRM_FAT_TEDDY_512(r_0, 16, 0, VECTORING, ptr);
ptr += loopBytes;
overlap = n_sh;
patch = FAT_TEDDY_VBMI_LOAD_MASK_PATCH;
}
for (; ptr + loopBytes <= buf_end; ptr += loopBytes) {
CHECK_FLOOD;
m512 r_0 = prep_conf_fat_teddy_512vbmi_templ<NMSK>(&lo_mask, dup_mask, sl_msk, set2x256(loadu256(ptr - n_sh)));
r_0 = or512(r_0, p_mask);
CONFIRM_FAT_TEDDY_512(r_0, 16, 0, NOT_CAUTIOUS, ptr - n_sh);
}
assert(ptr + loopBytes > buf_end);
if (ptr < buf_end) {
u32 left = (u32)(buf_end - ptr);
u64a k1 = FAT_TEDDY_VBMI_CONF_MASK_VAR(left);
m512 p_mask1 = set_mask_m512(~((k1 << 32) | k1));
m512 val_0 = set2x256(loadu_maskz_m256(k1 | patch, ptr - overlap));
m512 r_0 = prep_conf_fat_teddy_512vbmi_templ<NMSK>(&lo_mask, dup_mask, sl_msk, val_0);
r_0 = or512(r_0, p_mask1);
CONFIRM_FAT_TEDDY_512(r_0, 16, 0, VECTORING, ptr - overlap);
}
return HWLM_SUCCESS;
}
#define FDR_EXEC_FAT_TEDDY_FN fdr_exec_fat_teddy_512vbmi_templ
#elif defined(HAVE_AVX2) // not HAVE_AVX512 but HAVE_AVX2 reinforced teddy
#ifdef ARCH_64_BIT
extern "C" {
hwlm_error_t confirm_fat_teddy_64_256(m256 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff256(var, ones256()))) {
m256 swap = swap128in256(var);
m256 r = interleave256lo(var, swap);
u64a part1 = extractlow64from256(r);
u64a part2 = extract64from256(r, 1);
r = interleave256hi(var, swap);
u64a part3 = extractlow64from256(r);
u64a part4 = extract64from256(r, 1);
CONF_FAT_CHUNK_64(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part2, bucket, offset + 4, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part3, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_64(part4, bucket, offset + 12, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
} // extern C
#define confirm_fat_teddy_256_f confirm_fat_teddy_64_256
#else
extern "C" {
hwlm_error_t confirm_fat_teddy_32_256(m256 var, u8 bucket, u8 offset,
CautionReason reason, const u8 *ptr,
const struct FDR_Runtime_Args *a,
const u32* confBase, hwlm_group_t *control,
u32 *last_match) {
if (unlikely(diff256(var, ones256()))) {
m256 swap = swap128in256(var);
m256 r = interleave256lo(var, swap);
u32 part1 = extractlow32from256(r);
u32 part2 = extract32from256(r, 1);
u32 part3 = extract32from256(r, 2);
u32 part4 = extract32from256(r, 3);
r = interleave256hi(var, swap);
u32 part5 = extractlow32from256(r);
u32 part6 = extract32from256(r, 1);
u32 part7 = extract32from256(r, 2);
u32 part8 = extract32from256(r, 3);
CONF_FAT_CHUNK_32(part1, bucket, offset, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part2, bucket, offset + 2, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part3, bucket, offset + 4, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part4, bucket, offset + 6, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part5, bucket, offset + 8, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part6, bucket, offset + 10, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part7, bucket, offset + 12, reason, ptr, confBase, a, control, last_match);
CONF_FAT_CHUNK_32(part8, bucket, offset + 14, reason, ptr, confBase, a, control, last_match);
}
return HWLM_SUCCESS;
}
} // extern C
#define confirm_fat_teddy_256_f confirm_fat_teddy_32_256
#endif
#define CONFIRM_FAT_TEDDY_256(...) if(confirm_fat_teddy_256_f(__VA_ARGS__, a, confBase, &control, &last_match) == HWLM_TERMINATED)return HWLM_TERMINATED;
static really_inline
const m256 *getMaskBase_fat(const struct Teddy *teddy) {
return (const m256 *)((const u8 *)teddy + ROUNDUP_CL(sizeof(struct Teddy)));
}
static really_inline
m256 vectoredLoad2x128(m256 *p_mask, const u8 *ptr, const size_t start_offset,
const u8 *lo, const u8 *hi,
const u8 *buf_history, size_t len_history,
const u32 nMasks) {
m128 p_mask128;
m256 ret = set1_2x128(vectoredLoad128(&p_mask128, ptr, start_offset, lo, hi,
buf_history, len_history, nMasks));
*p_mask = set1_2x128(p_mask128);
return ret;
}
template<int NMSK>
static really_inline
m256 prep_conf_fat_teddy_256_templ(const m256 *maskBase, m256 val,
m256* old_1, m256* old_2, m256* old_3){
m256 mask = set1_32x8(0xf);
m256 lo = and256(val, mask);
m256 hi = and256(rshift64_m256(val, 4), mask);
m256 r = or256(pshufb_m256(maskBase[0 * 2], lo),
pshufb_m256(maskBase[0 * 2 + 1], hi));
if constexpr (NMSK == 1) return r;
m256 res_1 = or256(pshufb_m256(maskBase[(NMSK-1) * 2], lo),
pshufb_m256(maskBase[(NMSK-1) * 2 + 1], hi));
m256 res_shifted_1 = vpalignr(res_1, *old_1, 16 - (NMSK-1));
*old_1 = res_1;
r = or256(r, res_shifted_1);
if constexpr (NMSK == 2) return r;
m256 res_2 = or256(pshufb_m256(maskBase[(NMSK-1) * 2], lo),
pshufb_m256(maskBase[(NMSK-1) * 2 + 1], hi));
m256 res_shifted_2 = vpalignr(res_2, *old_2, 16 - (NMSK-1));
*old_2 = res_2;
r = or256(r, res_shifted_2);
if constexpr (NMSK == 3) return r;
m256 res_3 = or256(pshufb_m256(maskBase[(NMSK-1) * 2], lo),
pshufb_m256(maskBase[(NMSK-1) * 2 + 1], hi));
m256 res_shifted_3 = vpalignr(res_3, *old_3, 16 - (NMSK-1));
*old_3 = res_3;
return or256(r, res_shifted_3);
}
template<int NMSK>
hwlm_error_t fdr_exec_fat_teddy_256_templ(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
const u8 *buf_end = a->buf + a->len;
const u8 *ptr = a->buf + a->start_offset;
u32 floodBackoff = FLOOD_BACKOFF_START;
const u8 *tryFloodDetect = a->firstFloodDetect;
u32 last_match = ones_u32;
const struct Teddy *teddy = (const struct Teddy *)fdr;
const size_t iterBytes = 32;
DEBUG_PRINTF("params: buf %p len %zu start_offset %zu\n",
a->buf, a->len, a->start_offset);
const m256 *maskBase = getMaskBase_fat(teddy);
const u32 *confBase = getConfBase(teddy);
m256 res_old_1 = zeroes256();
m256 res_old_2 = zeroes256();
m256 res_old_3 = zeroes256();
const u8 *mainStart = ROUNDUP_PTR(ptr, 16);
DEBUG_PRINTF("derive: ptr: %p mainstart %p\n", ptr, mainStart);
if (ptr < mainStart) {
ptr = mainStart - 16;
m256 p_mask;
m256 val_0 = vectoredLoad2x128(&p_mask, ptr, a->start_offset,
a->buf, buf_end,
a->buf_history, a->len_history,
NMSK);
m256 r_0 = prep_conf_fat_teddy_256_templ<NMSK>(maskBase, val_0, &res_old_1, &res_old_2, &res_old_3);
r_0 = or256(r_0, p_mask);
CONFIRM_FAT_TEDDY_256(r_0, 16, 0, VECTORING, ptr);
ptr += 16;
}
if (ptr + 16 <= buf_end) {
m256 r_0 = prep_conf_fat_teddy_256_templ<NMSK>(maskBase, load2x128(ptr), &res_old_1, &res_old_2, &res_old_3);
CONFIRM_FAT_TEDDY_256(r_0, 16, 0, VECTORING, ptr);
ptr += 16;
}
for ( ; ptr + iterBytes <= buf_end; ptr += iterBytes) {
__builtin_prefetch(ptr + (iterBytes * 4));
CHECK_FLOOD;
m256 r_0 = prep_conf_fat_teddy_256_templ<NMSK>(maskBase, load2x128(ptr), &res_old_1, &res_old_2, &res_old_3);
CONFIRM_FAT_TEDDY_256(r_0, 16, 0, NOT_CAUTIOUS, ptr);
m256 r_1 = prep_conf_fat_teddy_256_templ<NMSK>(maskBase, load2x128(ptr + 16), &res_old_1, &res_old_2, &res_old_3);
CONFIRM_FAT_TEDDY_256(r_1, 16, 16, NOT_CAUTIOUS, ptr);
}
if (ptr + 16 <= buf_end) {
m256 r_0 = prep_conf_fat_teddy_256_templ<NMSK>(maskBase, load2x128(ptr), &res_old_1, &res_old_2, &res_old_3);
CONFIRM_FAT_TEDDY_256(r_0, 16, 0, NOT_CAUTIOUS, ptr);
ptr += 16;
}
assert(ptr + 16 > buf_end);
if (ptr < buf_end) {
m256 p_mask;
m256 val_0 = vectoredLoad2x128(&p_mask, ptr, 0, ptr, buf_end,
a->buf_history, a->len_history,
NMSK);
m256 r_0 = prep_conf_fat_teddy_256_templ<NMSK>(maskBase, val_0, &res_old_1, &res_old_2, &res_old_3);
r_0 = or256(r_0, p_mask);
CONFIRM_FAT_TEDDY_256(r_0, 16, 0, VECTORING, ptr);
}
return HWLM_SUCCESS;
}
// this check is because it is possible to build with both AVX512VBMI and AVX2 defined,
// to replicate the behaviour of the original flow of control we give preference
// to the former. If we're building for both then this will be compiled multiple times
// with the desired variant defined by itself.
#ifndef FDR_EXEC_FAT_TEDDY_FN
#define FDR_EXEC_FAT_TEDDY_FN fdr_exec_fat_teddy_256_templ
#endif
#endif // HAVE_AVX2 for fat teddy
/* we only have fat teddy in these two modes */
// #if (defined(HAVE_AVX2) || defined(HAVE_AVX512VBMI)) && defined(FDR_EXEC_FAT_TEDDY_FN)
// #if defined(FDR_EXEC_FAT_TEDDY_FN)
extern "C" {
hwlm_error_t fdr_exec_fat_teddy_msks1(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<1>(fdr, a, control);
}
hwlm_error_t fdr_exec_fat_teddy_msks1_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<1>(fdr, a, control);
}
hwlm_error_t fdr_exec_fat_teddy_msks2(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<2>(fdr, a, control);
}
hwlm_error_t fdr_exec_fat_teddy_msks2_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<2>(fdr, a, control);
}
hwlm_error_t fdr_exec_fat_teddy_msks3(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<3>(fdr, a, control);
}
hwlm_error_t fdr_exec_fat_teddy_msks3_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<3>(fdr, a, control);
}
hwlm_error_t fdr_exec_fat_teddy_msks4(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<4>(fdr, a, control);
}
hwlm_error_t fdr_exec_fat_teddy_msks4_pck(const struct FDR *fdr,
const struct FDR_Runtime_Args *a,
hwlm_group_t control) {
return FDR_EXEC_FAT_TEDDY_FN<4>(fdr, a, control);
}
} // extern c
#endif // HAVE_AVX2 from the beginning

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2016-2020, Intel Corporation
* Copyright (c) 2024, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -40,9 +41,15 @@
#include "util/simd_utils.h"
#include "util/uniform_ops.h"
extern const u8 ALIGN_DIRECTIVE p_mask_arr[17][32];
#if defined(HAVE_AVX2)
extern const u8 ALIGN_AVX_DIRECTIVE p_mask_arr256[33][64];
#if defined(HAVE_AVX512VBMI)
static const u8 ALIGN_DIRECTIVE p_sh_mask_arr[80] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f
};
#endif
#ifdef ARCH_64_BIT
@ -132,6 +139,37 @@ void copyRuntBlock128(u8 *dst, const u8 *src, size_t len) {
// |----------|-------|----------------|............|
// 0 start start+offset end(<=16)
// p_mask ffff.....ffffff..ff0000...........00ffff..........
// replace the p_mask_arr table.
// m is the length of the zone of bytes==0 , n is
// the offset where that zone begins. more specifically, there are
// 16-n bytes of 1's before the zone begins.
// m,n 4,7 - 4 bytes of 0s, and 16-7 bytes of 1's before that.
// 00 00 00 00 ff..ff
// ff ff ff ff ff ff ff ff 00 00 00 00 ff..ff
// m,n 15,15 - 15 bytes of 0s , f's high, but also with 16-15=1 byte of 1s
// in the beginning - which push the ff at the end off the high end , leaving
// ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
// m,n 15,16 - 15 bytes of 0s, ff high , with 16-16 = 0 ones on the low end
// before that, so,
// 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff
// so to get the one part, with the f's high, we start out with 1's and
// shift them up (right) by m+n.
// now to fill in any ones that belong on the low end we have to take
// some 1's and shift them down. the ones zone there needs to be 16-n long,
// meaning shifted down by 16-(16-n) , or of course just n.
// then we should be able to or these together.
static really_inline
m128 p_mask_gen(u8 m, u8 n){
m128 a = ones128();
m128 b = ones128();
m%=17; n%=17;
m+=(16-n); m%=17;
a = rshiftbyte_m128(a, n);
b = lshiftbyte_m128(b, m);
return or128(a, b);
}
static really_inline
m128 vectoredLoad128(m128 *p_mask, const u8 *ptr, const size_t start_offset,
const u8 *lo, const u8 *hi,
@ -151,13 +189,11 @@ m128 vectoredLoad128(m128 *p_mask, const u8 *ptr, const size_t start_offset,
uintptr_t avail = (uintptr_t)(hi - ptr);
if (avail >= 16) {
assert(start_offset - start <= 16);
*p_mask = loadu128(p_mask_arr[16 - start_offset + start]
+ 16 - start_offset + start);
*p_mask = p_mask_gen(16 - start_offset + start, 16 - start_offset + start);
return loadu128(ptr);
}
assert(start_offset - start <= avail);
*p_mask = loadu128(p_mask_arr[avail - start_offset + start]
+ 16 - start_offset + start);
*p_mask = p_mask_gen(avail - start_offset + start, 16 - start_offset + start);
copy_start = 0;
copy_len = avail;
} else { // start zone
@ -170,8 +206,7 @@ m128 vectoredLoad128(m128 *p_mask, const u8 *ptr, const size_t start_offset,
}
uintptr_t end = MIN(16, (uintptr_t)(hi - ptr));
assert(start + start_offset <= end);
*p_mask = loadu128(p_mask_arr[end - start - start_offset]
+ 16 - start - start_offset);
*p_mask = p_mask_gen(end - start - start_offset, 16 - start - start_offset);
copy_start = start;
copy_len = end - start;
}
@ -260,6 +295,20 @@ void copyRuntBlock256(u8 *dst, const u8 *src, size_t len) {
// |----------|-------|----------------|............|
// 0 start start+offset end(<=32)
// p_mask ffff.....ffffff..ff0000...........00ffff..........
// like the pmask gen above this replaces the large array.
static really_inline
m256 fat_pmask_gen(u8 m, u8 n){
m256 a=ones256();
m256 b=ones256();
m%=33; n%=33;
m+=(32-n); m%=33;
a = rshift_byte_m256(a, m);
b = lshift_byte_m256(b, n);
return or256(a, b);
}
static really_inline
m256 vectoredLoad256(m256 *p_mask, const u8 *ptr, const size_t start_offset,
const u8 *lo, const u8 *hi,
@ -279,13 +328,11 @@ m256 vectoredLoad256(m256 *p_mask, const u8 *ptr, const size_t start_offset,
uintptr_t avail = (uintptr_t)(hi - ptr);
if (avail >= 32) {
assert(start_offset - start <= 32);
*p_mask = loadu256(p_mask_arr256[32 - start_offset + start]
+ 32 - start_offset + start);
*p_mask = fat_pmask_gen(32 - start_offset + start, 32 - start_offset + start);
return loadu256(ptr);
}
assert(start_offset - start <= avail);
*p_mask = loadu256(p_mask_arr256[avail - start_offset + start]
+ 32 - start_offset + start);
*p_mask = fat_pmask_gen(avail - start_offset + start, 32 - start_offset + start);
copy_start = 0;
copy_len = avail;
} else { //start zone
@ -298,8 +345,7 @@ m256 vectoredLoad256(m256 *p_mask, const u8 *ptr, const size_t start_offset,
}
uintptr_t end = MIN(32, (uintptr_t)(hi - ptr));
assert(start + start_offset <= end);
*p_mask = loadu256(p_mask_arr256[end - start - start_offset]
+ 32 - start - start_offset);
*p_mask = fat_pmask_gen(end - start - start_offset, 32 - start - start_offset);
copy_start = start;
copy_len = end - start;
}
@ -338,7 +384,7 @@ static really_inline
m512 vectoredLoad512(m512 *p_mask, const u8 *ptr, const size_t start_offset,
const u8 *lo, const u8 *hi, const u8 *hbuf, size_t hlen,
const u32 nMasks) {
m512 val;
m512 val = zeroes512();
uintptr_t copy_start;
uintptr_t copy_len;
@ -418,8 +464,13 @@ void do_confWithBit_teddy(TEDDY_CONF_TYPE *conf, u8 bucket, u8 offset,
if (!cf) {
continue;
}
#ifdef __cplusplus
const struct FDRConfirm *fdrc = reinterpret_cast<const struct FDRConfirm *>
(reinterpret_cast<const u8 *>(confBase) + cf);
#else
const struct FDRConfirm *fdrc = (const struct FDRConfirm *)
((const u8 *)confBase + cf);
#endif
if (!(fdrc->groups & *control)) {
continue;
}
@ -432,18 +483,31 @@ void do_confWithBit_teddy(TEDDY_CONF_TYPE *conf, u8 bucket, u8 offset,
static really_inline
const m128 *getMaskBase(const struct Teddy *teddy) {
#ifdef __cplusplus
return reinterpret_cast<const m128 *>(reinterpret_cast<const u8 *>(teddy) + ROUNDUP_CL(sizeof(struct Teddy)));
#else
return (const m128 *)((const u8 *)teddy + ROUNDUP_CL(sizeof(struct Teddy)));
#endif
}
static really_inline
const u64a *getReinforcedMaskBase(const struct Teddy *teddy, u8 numMask) {
#ifdef __cplusplus
return reinterpret_cast<const u64a *>(reinterpret_cast<const u8 *>(getMaskBase(teddy))
+ ROUNDUP_CL(2 * numMask * sizeof(m128)));
#else
return (const u64a *)((const u8 *)getMaskBase(teddy)
+ ROUNDUP_CL(2 * numMask * sizeof(m128)));
#endif
}
static really_inline
const u32 *getConfBase(const struct Teddy *teddy) {
#ifdef __cplusplus
return reinterpret_cast<const u32 *>(reinterpret_cast<const u8 *>(teddy) + teddy->confOffset);
#else
return (const u32 *)((const u8 *)teddy + teddy->confOffset);
#endif
}
#endif /* TEDDY_RUNTIME_COMMON_H_ */

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2019, Intel Corporation
* Copyright (c) 2015-2021, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -44,8 +44,11 @@
#include "parser/prefilter.h"
#include "parser/unsupported.h"
#include "util/compile_error.h"
#include "util/cpuid_flags.h"
#include "util/cpuid_inline.h"
#include "util/arch/common/cpuid_flags.h"
#if defined(ARCH_IA32) || defined(ARCH_X86_64)
#include "util/arch/x86/cpuid_inline.h"
#elif defined(ARCH_ARM32) || defined(ARCH_AARCH64)
#endif
#include "util/depth.h"
#include "util/popcount.h"
#include "util/target_info.h"
@ -120,9 +123,10 @@ bool checkMode(unsigned int mode, hs_compile_error **comp_error) {
static
bool checkPlatform(const hs_platform_info *p, hs_compile_error **comp_error) {
static constexpr u32 HS_TUNE_LAST = HS_TUNE_FAMILY_GLM;
static constexpr u32 HS_TUNE_LAST = HS_TUNE_FAMILY_ICX;
static constexpr u32 HS_CPU_FEATURES_ALL =
HS_CPU_FEATURES_AVX2 | HS_CPU_FEATURES_AVX512;
HS_CPU_FEATURES_AVX2 | HS_CPU_FEATURES_AVX512 |
HS_CPU_FEATURES_AVX512VBMI;
if (!p) {
return true;
@ -195,11 +199,13 @@ hs_compile_multi_int(const char *const *expressions, const unsigned *flags,
}
#if defined(FAT_RUNTIME)
#if defined(ARCH_IA32) || defined(ARCH_X86_64)
if (!check_ssse3()) {
*db = nullptr;
*comp_error = generateCompileError("Unsupported architecture", -1);
return HS_ARCH_ERROR;
}
#endif
#endif
if (!checkMode(mode, comp_error)) {
@ -316,13 +322,14 @@ hs_compile_lit_multi_int(const char *const *expressions, const unsigned *flags,
*comp_error = generateCompileError("Invalid parameter: elements is zero", -1);
return HS_COMPILER_ERROR;
}
#if defined(FAT_RUNTIME)
#if defined(ARCH_IA32) || defined(ARCH_X86_64)
if (!check_ssse3()) {
*db = nullptr;
*comp_error = generateCompileError("Unsupported architecture", -1);
return HS_ARCH_ERROR;
}
#endif
#endif
if (!checkMode(mode, comp_error)) {
@ -496,10 +503,12 @@ hs_error_t hs_expression_info_int(const char *expression, unsigned int flags,
}
#if defined(FAT_RUNTIME)
#if defined(ARCH_IA32) || defined(ARCH_X86_64)
if (!check_ssse3()) {
*error = generateCompileError("Unsupported architecture", -1);
return HS_ARCH_ERROR;
}
#endif
#endif
if (!info) {
@ -513,6 +522,12 @@ hs_error_t hs_expression_info_int(const char *expression, unsigned int flags,
return HS_COMPILER_ERROR;
}
if (flags & HS_FLAG_COMBINATION) {
*error = generateCompileError("Invalid parameter: unsupported "
"logical combination expression", -1);
return HS_COMPILER_ERROR;
}
*info = nullptr;
*error = nullptr;
@ -574,7 +589,7 @@ hs_error_t hs_expression_info_int(const char *expression, unsigned int flags,
return HS_COMPILER_ERROR;
}
hs_expr_info *rv = (hs_expr_info *)hs_misc_alloc(sizeof(*rv));
hs_expr_info *rv = static_cast<hs_expr_info *>(hs_misc_alloc(sizeof(*rv)));
if (!rv) {
*error = const_cast<hs_compile_error_t *>(&hs_enomem);
return HS_COMPILER_ERROR;
@ -621,9 +636,11 @@ hs_error_t HS_CDECL hs_populate_platform(hs_platform_info_t *platform) {
extern "C" HS_PUBLIC_API
hs_error_t HS_CDECL hs_free_compile_error(hs_compile_error_t *error) {
#if defined(FAT_RUNTIME)
#if defined(ARCH_IA32) || defined(ARCH_X86_64)
if (!check_ssse3()) {
return HS_ARCH_ERROR;
}
#endif
#endif
freeCompileError(error);
return HS_SUCCESS;

View File

@ -39,12 +39,7 @@
* the individual component headers for documentation.
*/
/* The current Hyperscan version information. */
#define HS_MAJOR 5
#define HS_MINOR 3
#define HS_PATCH 0
#include "hs_version.h"
#include "hs_compile.h"
#include "hs_runtime.h"

View File

@ -29,11 +29,7 @@
#ifndef HS_COMMON_H_
#define HS_COMMON_H_
#if defined(_WIN32)
#define HS_CDECL __cdecl
#else
#define HS_CDECL
#endif
#include <stdlib.h>
/**

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2020, Intel Corporation
* Copyright (c) 2015-2021, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -748,10 +748,7 @@ hs_error_t HS_CDECL hs_free_compile_error(hs_compile_error_t *error);
* - HS_FLAG_PREFILTER - Compile pattern in prefiltering mode.
* - HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset
* when a match is found.
* - HS_FLAG_COMBINATION - Parse the expression in logical combination
* syntax.
* - HS_FLAG_QUIET - Ignore match reporting for this expression. Used for
* the sub-expressions in logical combinations.
* - HS_FLAG_QUIET - This flag will be ignored.
*
* @param info
* On success, a pointer to the pattern information will be returned in
@ -814,10 +811,7 @@ hs_error_t HS_CDECL hs_expression_info(const char *expression,
* - HS_FLAG_PREFILTER - Compile pattern in prefiltering mode.
* - HS_FLAG_SOM_LEFTMOST - Report the leftmost start of match offset
* when a match is found.
* - HS_FLAG_COMBINATION - Parse the expression in logical combination
* syntax.
* - HS_FLAG_QUIET - Ignore match reporting for this expression. Used for
* the sub-expressions in logical combinations.
* - HS_FLAG_QUIET - This flag will be ignored.
*
* @param ext
* A pointer to a filled @ref hs_expr_ext_t structure that defines
@ -1034,6 +1028,15 @@ hs_error_t HS_CDECL hs_populate_platform(hs_platform_info_t *platform);
*/
#define HS_CPU_FEATURES_AVX512 (1ULL << 3)
/**
* CPU features flag - Intel(R) Advanced Vector Extensions 512
* Vector Byte Manipulation Instructions (Intel(R) AVX512VBMI)
*
* Setting this flag indicates that the target platform supports AVX512VBMI
* instructions. Using AVX512VBMI implies the use of AVX512.
*/
#define HS_CPU_FEATURES_AVX512VBMI (1ULL << 4)
/** @} */
/**
@ -1114,6 +1117,22 @@ hs_error_t HS_CDECL hs_populate_platform(hs_platform_info_t *platform);
*/
#define HS_TUNE_FAMILY_GLM 8
/**
* Tuning Parameter - Intel(R) microarchitecture code name Icelake
*
* This indicates that the compiled database should be tuned for the
* Icelake microarchitecture.
*/
#define HS_TUNE_FAMILY_ICL 9
/**
* Tuning Parameter - Intel(R) microarchitecture code name Icelake Server
*
* This indicates that the compiled database should be tuned for the
* Icelake Server microarchitecture.
*/
#define HS_TUNE_FAMILY_ICX 10
/** @} */
/**

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2019, Intel Corporation
* Copyright (c) 2019-2021, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -80,7 +80,9 @@ extern "C"
| HS_FLAG_PREFILTER \
| HS_FLAG_SINGLEMATCH \
| HS_FLAG_ALLOWEMPTY \
| HS_FLAG_SOM_LEFTMOST)
| HS_FLAG_SOM_LEFTMOST \
| HS_FLAG_COMBINATION \
| HS_FLAG_QUIET)
#ifdef __cplusplus
} /* extern "C" */

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2016-2017, Intel Corporation
* Copyright (c) 2020-2023, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -26,16 +27,36 @@
* POSSIBILITY OF SUCH DAMAGE.
*/
#include "config.h"
#include "hs_common.h"
#include "util/cpuid_flags.h"
#include "util/cpuid_inline.h"
#include "ue2common.h"
#if !defined(VS_SIMDE_BACKEND)
#if defined(ARCH_IA32) || defined(ARCH_X86_64)
#include "util/arch/x86/cpuid_inline.h"
#elif defined(ARCH_AARCH64)
#include "util/arch/arm/cpuid_inline.h"
#endif
#endif
HS_PUBLIC_API
hs_error_t HS_CDECL hs_valid_platform(void) {
/* Hyperscan requires SSSE3, anything else is a bonus */
if (check_ssse3()) {
/* Vectorscan requires SSE4.2, anything else is a bonus */
#if !defined(VS_SIMDE_BACKEND) && (defined(ARCH_IA32) || defined(ARCH_X86_64))
// cppcheck-suppress knownConditionTrueFalse
if (check_sse42()) {
return HS_SUCCESS;
} else {
return HS_ARCH_ERROR;
}
#elif !defined(VS_SIMDE_BACKEND) && (defined(ARCH_ARM32) || defined(ARCH_AARCH64))
//check_neon returns true for now
// cppcheck-suppress knownConditionTrueFalse
if (check_neon()) {
return HS_SUCCESS;
} else {
return HS_ARCH_ERROR;
}
#elif defined(ARCH_PPC64EL) || defined(VS_SIMDE_BACKEND)
return HS_SUCCESS;
#endif
}

View File

@ -36,5 +36,9 @@
#define HS_VERSION_32BIT ((@HS_MAJOR_VERSION@ << 24) | (@HS_MINOR_VERSION@ << 16) | (@HS_PATCH_VERSION@ << 8) | 0)
#define HS_MAJOR @HS_MAJOR_VERSION@
#define HS_MINOR @HS_MINOR_VERSION@
#define HS_PATCH @HS_PATCH_VERSION@
#endif /* HS_VERSION_H_C6428FAF8E3713 */

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2021, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -38,7 +39,7 @@
#include "nfa/accel.h"
#include "nfa/shufti.h"
#include "nfa/truffle.h"
#include "nfa/vermicelli.h"
#include "nfa/vermicelli.hpp"
#include <string.h>
#define MIN_ACCEL_LEN_BLOCK 16
@ -62,12 +63,22 @@ const u8 *run_hwlm_accel(const union AccelAux *aux, const u8 *ptr,
DEBUG_PRINTF("double vermicelli-nocase for 0x%02hhx%02hhx\n",
aux->dverm.c1, aux->dverm.c2);
return vermicelliDoubleExec(aux->dverm.c1, aux->dverm.c2, 1, ptr, end);
#ifdef HAVE_SVE2
case ACCEL_VERM16:
DEBUG_PRINTF("single vermicelli16\n");
return vermicelli16Exec(aux->verm16.mask, ptr, end);
#endif // HAVE_SVE2
case ACCEL_SHUFTI:
DEBUG_PRINTF("single shufti\n");
return shuftiExec(aux->shufti.lo, aux->shufti.hi, ptr, end);
case ACCEL_TRUFFLE:
DEBUG_PRINTF("truffle\n");
return truffleExec(aux->truffle.mask1, aux->truffle.mask2, ptr, end);
return truffleExec(aux->truffle.mask_lo, aux->truffle.mask_hi, ptr, end);
#ifdef CAN_USE_WIDE_TRUFFLE
case ACCEL_TRUFFLE_WIDE:
DEBUG_PRINTF("truffle wide\n");
return truffleExecWide(aux->truffle.mask, ptr, end);
#endif // CAN_USE_WIDE_TRUFFLE
default:
/* no acceleration, fall through and return current ptr */
DEBUG_PRINTF("no accel; %u\n", (int)aux->accel_type);
@ -164,8 +175,7 @@ void do_accel_streaming(const union AccelAux *aux, const u8 *hbuf, size_t hlen,
DEBUG_PRINTF("got %zu/%zu in 2nd buffer\n", delta, len);
*start += delta;
} else if (hlen) {
UNUSED size_t remaining = offset + ptr2 - found;
DEBUG_PRINTF("got %zu/%zu remaining in 1st buffer\n", remaining, hlen);
DEBUG_PRINTF("got %zu/%zu remaining in 1st buffer\n", offset + ptr2 - found, hlen);
}
}

View File

@ -46,7 +46,6 @@
#include "fdr/teddy_engine_description.h"
#include "util/compile_context.h"
#include "util/compile_error.h"
#include "util/make_unique.h"
#include "util/ue2string.h"
#include <cassert>
@ -58,24 +57,24 @@ using namespace std;
namespace ue2 {
HWLMProto::HWLMProto(u8 engType_in, vector<hwlmLiteral> lits_in)
: engType(engType_in), lits(move(lits_in)) {}
: engType(engType_in), lits(std::move(lits_in)) {}
HWLMProto::HWLMProto(u8 engType_in,
unique_ptr<FDREngineDescription> eng_in,
vector<hwlmLiteral> lits_in,
map<u32, vector<u32>> bucketToLits_in,
bool make_small_in)
: engType(engType_in), fdrEng(move(eng_in)), lits(move(lits_in)),
bucketToLits(move(bucketToLits_in)), make_small(make_small_in) {}
: engType(engType_in), fdrEng(std::move(eng_in)), lits(std::move(lits_in)),
bucketToLits(std::move(bucketToLits_in)), make_small(make_small_in) {}
HWLMProto::HWLMProto(u8 engType_in,
unique_ptr<TeddyEngineDescription> eng_in,
vector<hwlmLiteral> lits_in,
map<u32, vector<u32>> bucketToLits_in,
bool make_small_in)
: engType(engType_in), teddyEng(move(eng_in)),
lits(move(lits_in)),
bucketToLits(move(bucketToLits_in)), make_small(make_small_in) {}
: engType(engType_in), teddyEng(std::move(eng_in)),
lits(std::move(lits_in)),
bucketToLits(std::move(bucketToLits_in)), make_small(make_small_in) {}
HWLMProto::~HWLMProto() {}
@ -94,6 +93,7 @@ void dumpLits(UNUSED const vector<hwlmLiteral> &lits) {
// Called by an assertion.
static
bool everyoneHasGroups(const vector<hwlmLiteral> &lits) {
// cppcheck-suppress useStlAlgorithm
for (const auto &lit : lits) {
if (!lit.groups) {
return false;
@ -133,18 +133,18 @@ bytecode_ptr<HWLM> hwlmBuild(const HWLMProto &proto, const CompileContext &cc,
if (noodle) {
engSize = noodle.size();
}
eng = move(noodle);
eng = std::move(noodle);
} else {
DEBUG_PRINTF("building a new deal\n");
auto fdr = fdrBuildTable(proto, cc.grey);
if (fdr) {
engSize = fdr.size();
}
eng = move(fdr);
eng = std::move(fdr);
}
if (!eng) {
return nullptr;
return bytecode_ptr<HWLM>(nullptr);
}
assert(engSize);
@ -156,6 +156,7 @@ bytecode_ptr<HWLM> hwlmBuild(const HWLMProto &proto, const CompileContext &cc,
auto h = make_zeroed_bytecode_ptr<HWLM>(hwlm_len, 64);
h->type = proto.engType;
// cppcheck-suppress cstyleCast
memcpy(HWLM_DATA(h.get()), eng.get(), engSize);
return h;
@ -201,7 +202,7 @@ hwlmBuildProto(vector<hwlmLiteral> &lits, bool make_small,
if (isNoodleable(lits, cc)) {
DEBUG_PRINTF("build noodle table\n");
proto = ue2::make_unique<HWLMProto>(HWLM_ENGINE_NOOD, lits);
proto = std::make_unique<HWLMProto>(HWLM_ENGINE_NOOD, lits);
} else {
DEBUG_PRINTF("building a new deal\n");
proto = fdrBuildProto(HWLM_ENGINE_FDR, lits, make_small,
@ -219,10 +220,12 @@ size_t hwlmSize(const HWLM *h) {
switch (h->type) {
case HWLM_ENGINE_NOOD:
engSize = noodSize((const noodTable *)HWLM_C_DATA(h));
// cppcheck-suppress cstyleCast
engSize = noodSize(reinterpret_cast<const noodTable *>(HWLM_C_DATA(h)));
break;
case HWLM_ENGINE_FDR:
engSize = fdrSize((const FDR *)HWLM_C_DATA(h));
// cppcheck-suppress cstyleCast
engSize = fdrSize(reinterpret_cast<const FDR *>(HWLM_C_DATA(h)));
break;
}

View File

@ -53,10 +53,12 @@ void hwlmGenerateDumpFiles(const HWLM *h, const string &base) {
switch (h->type) {
case HWLM_ENGINE_NOOD:
noodPrintStats((const noodTable *)HWLM_C_DATA(h), f);
// cppcheck-suppress cstyleCast
noodPrintStats(reinterpret_cast<const noodTable *>(HWLM_C_DATA(h)), f);
break;
case HWLM_ENGINE_FDR:
fdrPrintStats((const FDR *)HWLM_C_DATA(h), f);
// cppcheck-suppress cstyleCast
fdrPrintStats(reinterpret_cast<const FDR *>(HWLM_C_DATA(h)), f);
break;
default:
fprintf(f, "<unknown hwlm subengine>\n");

View File

@ -56,7 +56,7 @@ u64a make_u64a_mask(const vector<u8> &v) {
u64a mask = 0;
size_t len = v.size();
unsigned char *m = (unsigned char *)&mask;
u8 *m = reinterpret_cast<u8 *>(&mask);
DEBUG_PRINTF("making mask len %zu\n", len);
memcpy(m, &v[0], len);
return mask;
@ -156,7 +156,7 @@ void noodPrintStats(const noodTable *n, FILE *f) {
n->msk_len);
fprintf(f, "String: ");
for (u32 i = 0; i < n->msk_len; i++) {
const u8 *m = (const u8 *)&n->cmp;
const u8 *m = reinterpret_cast<const u8 *>(&n->cmp);
if (isgraph(m[i]) && m[i] != '\\') {
fprintf(f, "%c", m[i]);
} else {

View File

@ -1,442 +0,0 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/** \file
* \brief Noodle literal matcher: runtime.
*/
#include "hwlm.h"
#include "noodle_engine.h"
#include "noodle_internal.h"
#include "scratch.h"
#include "ue2common.h"
#include "util/arch.h"
#include "util/bitutils.h"
#include "util/compare.h"
#include "util/intrinsics.h"
#include "util/join.h"
#include "util/masked_move.h"
#include "util/partial_store.h"
#include "util/simd_utils.h"
#include <ctype.h>
#include <stdbool.h>
#include <string.h>
/** \brief Noodle runtime context. */
struct cb_info {
HWLMCallback cb; //!< callback function called on match
u32 id; //!< ID to pass to callback on match
struct hs_scratch *scratch; //!< scratch to pass to callback
size_t offsetAdj; //!< used in streaming mode
};
#if defined(HAVE_AVX512)
#define CHUNKSIZE 64
#define MASK_TYPE m512
#define Z_BITS 64
#define Z_TYPE u64a
#elif defined(HAVE_AVX2)
#define CHUNKSIZE 32
#define MASK_TYPE m256
#define Z_BITS 32
#define Z_TYPE u32
#else
#define CHUNKSIZE 16
#define MASK_TYPE m128
#define Z_BITS 32
#define Z_TYPE u32
#endif
#define RETURN_IF_TERMINATED(x) \
{ \
if ((x) == HWLM_TERMINATED) { \
return HWLM_TERMINATED; \
} \
}
#define SINGLE_ZSCAN() \
do { \
while (unlikely(z)) { \
Z_TYPE pos = JOIN(findAndClearLSB_, Z_BITS)(&z); \
size_t matchPos = d - buf + pos; \
DEBUG_PRINTF("match pos %zu\n", matchPos); \
hwlmcb_rv_t rv = final(n, buf, len, 1, cbi, matchPos); \
RETURN_IF_TERMINATED(rv); \
} \
} while (0)
#define DOUBLE_ZSCAN() \
do { \
while (unlikely(z)) { \
Z_TYPE pos = JOIN(findAndClearLSB_, Z_BITS)(&z); \
size_t matchPos = d - buf + pos - 1; \
DEBUG_PRINTF("match pos %zu\n", matchPos); \
hwlmcb_rv_t rv = final(n, buf, len, 0, cbi, matchPos); \
RETURN_IF_TERMINATED(rv); \
} \
} while (0)
static really_inline
u8 caseClear8(u8 x, bool noCase) {
return (u8)(noCase ? (x & (u8)0xdf) : x);
}
// Make sure the rest of the string is there. The single character scanner
// is used only for single chars with case insensitivity used correctly,
// so it can go straight to the callback if we get this far.
static really_inline
hwlm_error_t final(const struct noodTable *n, const u8 *buf, UNUSED size_t len,
char single, const struct cb_info *cbi, size_t pos) {
if (single) {
if (n->msk_len == 1) {
goto match;
}
}
assert(len >= n->msk_len);
u64a v =
partial_load_u64a(buf + pos + n->key_offset - n->msk_len, n->msk_len);
DEBUG_PRINTF("v %016llx msk %016llx cmp %016llx\n", v, n->msk, n->cmp);
if ((v & n->msk) != n->cmp) {
/* mask didn't match */
return HWLM_SUCCESS;
}
match:
pos -= cbi->offsetAdj;
DEBUG_PRINTF("match @ %zu\n", pos + n->key_offset);
hwlmcb_rv_t rv = cbi->cb(pos + n->key_offset - 1, cbi->id, cbi->scratch);
if (rv == HWLM_TERMINATE_MATCHING) {
return HWLM_TERMINATED;
}
return HWLM_SUCCESS;
}
#if defined(HAVE_AVX512)
#define CHUNKSIZE 64
#define MASK_TYPE m512
#include "noodle_engine_avx512.c"
#elif defined(HAVE_AVX2)
#define CHUNKSIZE 32
#define MASK_TYPE m256
#include "noodle_engine_avx2.c"
#else
#define CHUNKSIZE 16
#define MASK_TYPE m128
#include "noodle_engine_sse.c"
#endif
static really_inline
hwlm_error_t scanSingleMain(const struct noodTable *n, const u8 *buf,
size_t len, size_t start, bool noCase,
const struct cb_info *cbi) {
const MASK_TYPE mask1 = getMask(n->key0, noCase);
const MASK_TYPE caseMask = getCaseMask();
size_t offset = start + n->msk_len - 1;
size_t end = len;
assert(offset < end);
#if !defined(HAVE_AVX512)
hwlm_error_t rv;
if (end - offset < CHUNKSIZE) {
rv = scanSingleShort(n, buf, len, noCase, caseMask, mask1, cbi, offset,
end);
return rv;
}
if (end - offset == CHUNKSIZE) {
rv = scanSingleUnaligned(n, buf, len, offset, noCase, caseMask, mask1,
cbi, offset, end);
return rv;
}
uintptr_t data = (uintptr_t)buf;
uintptr_t s2Start = ROUNDUP_N(data + offset, CHUNKSIZE) - data;
uintptr_t last = data + end;
uintptr_t s2End = ROUNDDOWN_N(last, CHUNKSIZE) - data;
uintptr_t s3Start = end - CHUNKSIZE;
if (offset != s2Start) {
// first scan out to the fast scan starting point
DEBUG_PRINTF("stage 1: -> %zu\n", s2Start);
rv = scanSingleUnaligned(n, buf, len, offset, noCase, caseMask, mask1,
cbi, offset, s2Start);
RETURN_IF_TERMINATED(rv);
}
if (likely(s2Start != s2End)) {
// scan as far as we can, bounded by the last point this key can
// possibly match
DEBUG_PRINTF("fast: ~ %zu -> %zu\n", s2Start, s2End);
rv = scanSingleFast(n, buf, len, noCase, caseMask, mask1, cbi, s2Start,
s2End);
RETURN_IF_TERMINATED(rv);
}
// if we are done bail out
if (s2End == len) {
return HWLM_SUCCESS;
}
DEBUG_PRINTF("stage 3: %zu -> %zu\n", s2End, len);
rv = scanSingleUnaligned(n, buf, len, s3Start, noCase, caseMask, mask1, cbi,
s2End, len);
return rv;
#else // HAVE_AVX512
return scanSingle512(n, buf, len, noCase, caseMask, mask1, cbi, offset,
end);
#endif
}
static really_inline
hwlm_error_t scanDoubleMain(const struct noodTable *n, const u8 *buf,
size_t len, size_t start, bool noCase,
const struct cb_info *cbi) {
// we stop scanning for the key-fragment when the rest of the key can't
// possibly fit in the remaining buffer
size_t end = len - n->key_offset + 2;
// the first place the key can match
size_t offset = start + n->msk_len - n->key_offset;
const MASK_TYPE caseMask = getCaseMask();
const MASK_TYPE mask1 = getMask(n->key0, noCase);
const MASK_TYPE mask2 = getMask(n->key1, noCase);
#if !defined(HAVE_AVX512)
hwlm_error_t rv;
if (end - offset < CHUNKSIZE) {
rv = scanDoubleShort(n, buf, len, noCase, caseMask, mask1, mask2, cbi,
offset, end);
return rv;
}
if (end - offset == CHUNKSIZE) {
rv = scanDoubleUnaligned(n, buf, len, offset, noCase, caseMask, mask1,
mask2, cbi, offset, end);
return rv;
}
uintptr_t data = (uintptr_t)buf;
uintptr_t s2Start = ROUNDUP_N(data + offset, CHUNKSIZE) - data;
uintptr_t s1End = s2Start + 1;
uintptr_t last = data + end;
uintptr_t s2End = ROUNDDOWN_N(last, CHUNKSIZE) - data;
uintptr_t s3Start = end - CHUNKSIZE;
uintptr_t off = offset;
if (s2Start != off) {
// first scan out to the fast scan starting point plus one char past to
// catch the key on the overlap
DEBUG_PRINTF("stage 1: %zu -> %zu\n", off, s2Start);
rv = scanDoubleUnaligned(n, buf, len, offset, noCase, caseMask, mask1,
mask2, cbi, off, s1End);
RETURN_IF_TERMINATED(rv);
}
off = s1End;
if (s2Start >= end) {
DEBUG_PRINTF("s2 == mL %zu\n", end);
return HWLM_SUCCESS;
}
if (likely(s2Start != s2End)) {
// scan as far as we can, bounded by the last point this key can
// possibly match
DEBUG_PRINTF("fast: ~ %zu -> %zu\n", s2Start, s3Start);
rv = scanDoubleFast(n, buf, len, noCase, caseMask, mask1, mask2, cbi,
s2Start, s2End);
RETURN_IF_TERMINATED(rv);
off = s2End;
}
// if there isn't enough data left to match the key, bail out
if (s2End == end) {
return HWLM_SUCCESS;
}
DEBUG_PRINTF("stage 3: %zu -> %zu\n", s3Start, end);
rv = scanDoubleUnaligned(n, buf, len, s3Start, noCase, caseMask, mask1,
mask2, cbi, off, end);
return rv;
#else // AVX512
return scanDouble512(n, buf, len, noCase, caseMask, mask1, mask2, cbi,
offset, end);
#endif // AVX512
}
static really_inline
hwlm_error_t scanSingleNoCase(const struct noodTable *n, const u8 *buf,
size_t len, size_t start,
const struct cb_info *cbi) {
return scanSingleMain(n, buf, len, start, 1, cbi);
}
static really_inline
hwlm_error_t scanSingleCase(const struct noodTable *n, const u8 *buf,
size_t len, size_t start,
const struct cb_info *cbi) {
return scanSingleMain(n, buf, len, start, 0, cbi);
}
// Single-character specialisation, used when keyLen = 1
static really_inline
hwlm_error_t scanSingle(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, bool noCase, const struct cb_info *cbi) {
if (!ourisalpha(n->key0)) {
noCase = 0; // force noCase off if we don't have an alphabetic char
}
// kinda ugly, but this forces constant propagation
if (noCase) {
return scanSingleNoCase(n, buf, len, start, cbi);
} else {
return scanSingleCase(n, buf, len, start, cbi);
}
}
static really_inline
hwlm_error_t scanDoubleNoCase(const struct noodTable *n, const u8 *buf,
size_t len, size_t start,
const struct cb_info *cbi) {
return scanDoubleMain(n, buf, len, start, 1, cbi);
}
static really_inline
hwlm_error_t scanDoubleCase(const struct noodTable *n, const u8 *buf,
size_t len, size_t start,
const struct cb_info *cbi) {
return scanDoubleMain(n, buf, len, start, 0, cbi);
}
static really_inline
hwlm_error_t scanDouble(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, bool noCase, const struct cb_info *cbi) {
// kinda ugly, but this forces constant propagation
if (noCase) {
return scanDoubleNoCase(n, buf, len, start, cbi);
} else {
return scanDoubleCase(n, buf, len, start, cbi);
}
}
// main entry point for the scan code
static really_inline
hwlm_error_t scan(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, char single, bool noCase,
const struct cb_info *cbi) {
if (len - start < n->msk_len) {
// can't find string of length keyLen in a shorter buffer
return HWLM_SUCCESS;
}
if (single) {
return scanSingle(n, buf, len, start, noCase, cbi);
} else {
return scanDouble(n, buf, len, start, noCase, cbi);
}
}
/** \brief Block-mode scanner. */
hwlm_error_t noodExec(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, HWLMCallback cb,
struct hs_scratch *scratch) {
assert(n && buf);
struct cb_info cbi = {cb, n->id, scratch, 0};
DEBUG_PRINTF("nood scan of %zu bytes for %*s @ %p\n", len, n->msk_len,
(const char *)&n->cmp, buf);
return scan(n, buf, len, start, n->single, n->nocase, &cbi);
}
/** \brief Streaming-mode scanner. */
hwlm_error_t noodExecStreaming(const struct noodTable *n, const u8 *hbuf,
size_t hlen, const u8 *buf, size_t len,
HWLMCallback cb, struct hs_scratch *scratch) {
assert(n);
if (len + hlen < n->msk_len) {
DEBUG_PRINTF("not enough bytes for a match\n");
return HWLM_SUCCESS;
}
struct cb_info cbi = {cb, n->id, scratch, 0};
DEBUG_PRINTF("nood scan of %zu bytes (%zu hlen) for %*s @ %p\n", len, hlen,
n->msk_len, (const char *)&n->cmp, buf);
if (hlen && n->msk_len > 1) {
/*
* we have history, so build up a buffer from enough of the history
* buffer plus what we've been given to scan. Since this is relatively
* short, just check against msk+cmp per byte offset for matches.
*/
assert(hbuf);
u8 ALIGN_DIRECTIVE temp_buf[HWLM_LITERAL_MAX_LEN * 2];
memset(temp_buf, 0, sizeof(temp_buf));
assert(n->msk_len);
size_t tl1 = MIN((size_t)n->msk_len - 1, hlen);
size_t tl2 = MIN((size_t)n->msk_len - 1, len);
assert(tl1 + tl2 <= sizeof(temp_buf));
assert(tl1 + tl2 >= n->msk_len);
assert(tl1 <= sizeof(u64a));
assert(tl2 <= sizeof(u64a));
DEBUG_PRINTF("using %zu bytes of hist and %zu bytes of buf\n", tl1, tl2);
unaligned_store_u64a(temp_buf,
partial_load_u64a(hbuf + hlen - tl1, tl1));
unaligned_store_u64a(temp_buf + tl1, partial_load_u64a(buf, tl2));
for (size_t i = 0; i <= tl1 + tl2 - n->msk_len; i++) {
u64a v = unaligned_load_u64a(temp_buf + i);
if ((v & n->msk) == n->cmp) {
size_t m_end = -tl1 + i + n->msk_len - 1;
DEBUG_PRINTF("match @ %zu (i %zu)\n", m_end, i);
hwlmcb_rv_t rv = cb(m_end, n->id, scratch);
if (rv == HWLM_TERMINATE_MATCHING) {
return HWLM_TERMINATED;
}
}
}
}
assert(buf);
cbi.offsetAdj = 0;
return scan(n, buf, len, 0, n->single, n->nocase, &cbi);
}

191
src/hwlm/noodle_engine.cpp Normal file
View File

@ -0,0 +1,191 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2020, 2021, VectorCamp PC
* Copyright (c) 2021, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/** \file
* \brief Noodle literal matcher: runtime.
*/
#include "hwlm.h"
#include "noodle_engine.h"
#include "noodle_internal.h"
#include "scratch.h"
#include "ue2common.h"
#include "util/arch.h"
#include "util/bitutils.h"
#include "util/compare.h"
#include "util/intrinsics.h"
#include "util/join.h"
#include "util/partial_store.h"
#include "util/simd_utils.h"
#if defined(HAVE_AVX2)
#include "util/arch/x86/masked_move.h"
#endif
#include <ctype.h>
#include <stdbool.h>
#include <string.h>
/** \brief Noodle runtime context. */
struct cb_info {
HWLMCallback cb; //!< callback function called on match
u32 id; //!< ID to pass to callback on match
struct hs_scratch *scratch; //!< scratch to pass to callback
size_t offsetAdj; //!< used in streaming mode
};
#define RETURN_IF_TERMINATED(x) \
{ \
if ((x) == HWLM_TERMINATED) { \
return HWLM_TERMINATED; \
} \
}
// Make sure the rest of the string is there. The single character scanner
// is used only for single chars with case insensitivity used correctly,
// so it can go straight to the callback if we get this far.
static really_inline
hwlm_error_t final(const struct noodTable *n, const u8 *buf, UNUSED size_t len,
bool needsConfirm, const struct cb_info *cbi, size_t pos) {
u64a v{0};
if (!needsConfirm) {
goto match;
}
assert(len >= n->msk_len);
v = partial_load_u64a(buf + pos + n->key_offset - n->msk_len, n->msk_len);
DEBUG_PRINTF("v %016llx msk %016llx cmp %016llx\n", v, n->msk, n->cmp);
if ((v & n->msk) != n->cmp) {
/* mask didn't match */
return HWLM_SUCCESS;
}
match:
pos -= cbi->offsetAdj;
DEBUG_PRINTF("match @ %zu\n", pos + n->key_offset);
hwlmcb_rv_t rv = cbi->cb(pos + n->key_offset - 1, cbi->id, cbi->scratch);
if (rv == HWLM_TERMINATE_MATCHING) {
return HWLM_TERMINATED;
}
return HWLM_SUCCESS;
}
#ifdef HAVE_SVE2
#include "noodle_engine_sve.hpp"
#else
#include "noodle_engine_simd.hpp"
#endif
// main entry point for the scan code
static really_inline
hwlm_error_t scan(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, char single, bool noCase,
const struct cb_info *cbi) {
if (len - start < n->msk_len) {
// can't find string of length keyLen in a shorter buffer
return HWLM_SUCCESS;
}
if (single) {
return scanSingle(n, buf, len, start, noCase, cbi);
} else {
return scanDouble(n, buf, len, start, noCase, cbi);
}
}
/** \brief Block-mode scanner. */
hwlm_error_t noodExec(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, HWLMCallback cb,
struct hs_scratch *scratch) {
assert(n && buf);
struct cb_info cbi = {cb, n->id, scratch, 0};
DEBUG_PRINTF("nood scan of %zu bytes for %*s @ %p\n", len, n->msk_len,
(const char *)&n->cmp, buf);
return scan(n, buf, len, start, n->single, n->nocase, &cbi);
}
/** \brief Streaming-mode scanner. */
hwlm_error_t noodExecStreaming(const struct noodTable *n, const u8 *hbuf,
size_t hlen, const u8 *buf, size_t len,
HWLMCallback cb, struct hs_scratch *scratch) {
assert(n);
if (len + hlen < n->msk_len) {
DEBUG_PRINTF("not enough bytes for a match\n");
return HWLM_SUCCESS;
}
struct cb_info cbi = {cb, n->id, scratch, 0};
DEBUG_PRINTF("nood scan of %zu bytes (%zu hlen) for %*s @ %p\n", len, hlen,
n->msk_len, (const char *)&n->cmp, buf);
if (hlen && n->msk_len > 1) {
/*
* we have history, so build up a buffer from enough of the history
* buffer plus what we've been given to scan. Since this is relatively
* short, just check against msk+cmp per byte offset for matches.
*/
assert(hbuf);
u8 ALIGN_DIRECTIVE temp_buf[HWLM_LITERAL_MAX_LEN * 2];
memset(temp_buf, 0, sizeof(temp_buf));
assert(n->msk_len);
size_t tl1 = MIN((size_t)n->msk_len - 1, hlen);
size_t tl2 = MIN((size_t)n->msk_len - 1, len);
assert(tl1 + tl2 <= sizeof(temp_buf));
assert(tl1 + tl2 >= n->msk_len);
assert(tl1 <= sizeof(u64a));
assert(tl2 <= sizeof(u64a));
DEBUG_PRINTF("using %zu bytes of hist and %zu bytes of buf\n", tl1, tl2);
unaligned_store_u64a(temp_buf,
partial_load_u64a(hbuf + hlen - tl1, tl1));
unaligned_store_u64a(temp_buf + tl1, partial_load_u64a(buf, tl2));
for (size_t i = 0; i <= tl1 + tl2 - n->msk_len; i++) {
u64a v = unaligned_load_u64a(temp_buf + i);
if ((v & n->msk) == n->cmp) {
size_t m_end = -tl1 + i + n->msk_len - 1;
DEBUG_PRINTF("match @ %zu (i %zu)\n", m_end, i);
hwlmcb_rv_t rv = cb(m_end, n->id, scratch);
if (rv == HWLM_TERMINATE_MATCHING) {
return HWLM_TERMINATED;
}
}
}
}
assert(buf);
cbi.offsetAdj = 0;
return scan(n, buf, len, 0, n->single, n->nocase, &cbi);
}

View File

@ -1,233 +0,0 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/* noodle scan parts for AVX */
static really_inline m256 getMask(u8 c, bool noCase) {
u8 k = caseClear8(c, noCase);
return set32x8(k);
}
static really_inline m256 getCaseMask(void) {
return set32x8(0xdf);
}
static really_inline
hwlm_error_t scanSingleUnaligned(const struct noodTable *n, const u8 *buf,
size_t len, size_t offset, bool noCase,
m256 caseMask, m256 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + offset;
DEBUG_PRINTF("start %zu end %zu offset %zu\n", start, end, offset);
const size_t l = end - start;
m256 v = loadu256(d);
if (noCase) {
v = and256(v, caseMask);
}
u32 z = movemask256(eq256(mask1, v));
u32 buf_off = start - offset;
u32 mask = (u32)((u64a)(1ULL << l) - 1) << buf_off;
DEBUG_PRINTF("mask 0x%08x z 0x%08x\n", mask, z);
z &= mask;
SINGLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanDoubleUnaligned(const struct noodTable *n, const u8 *buf,
size_t len, size_t offset, bool noCase,
m256 caseMask, m256 mask1, m256 mask2,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + offset;
DEBUG_PRINTF("start %zu end %zu offset %zu\n", start, end, offset);
size_t l = end - start;
m256 v = loadu256(d);
if (noCase) {
v = and256(v, caseMask);
}
u32 z0 = movemask256(eq256(mask1, v));
u32 z1 = movemask256(eq256(mask2, v));
u32 z = (z0 << 1) & z1;
// mask out where we can't match
u32 buf_off = start - offset;
u32 mask = (u32)((u64a)(1ULL << l) - 1) << buf_off;
DEBUG_PRINTF("mask 0x%08x z 0x%08x\n", mask, z);
z &= mask;
DOUBLE_ZSCAN();
return HWLM_SUCCESS;
}
// The short scan routine. It is used both to scan data up to an
// alignment boundary if needed and to finish off data that the aligned scan
// function can't handle (due to small/unaligned chunk at end)
static really_inline
hwlm_error_t scanSingleShort(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m256 caseMask, m256 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start;
size_t l = end - start;
DEBUG_PRINTF("l %zu\n", l);
assert(l <= 32);
if (!l) {
return HWLM_SUCCESS;
}
m256 v;
if (l < 4) {
u8 *vp = (u8*)&v;
switch (l) {
case 3: vp[2] = d[2]; // fallthrough
case 2: vp[1] = d[1]; // fallthrough
case 1: vp[0] = d[0]; // fallthrough
}
} else {
v = masked_move256_len(d, l);
}
if (noCase) {
v = and256(v, caseMask);
}
// mask out where we can't match
u32 mask = (0xFFFFFFFF >> (32 - l));
u32 z = mask & movemask256(eq256(mask1, v));
SINGLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanDoubleShort(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m256 caseMask, m256 mask1,
m256 mask2, const struct cb_info *cbi,
size_t start, size_t end) {
const u8 *d = buf + start;
size_t l = end - start;
if (!l) {
return HWLM_SUCCESS;
}
assert(l <= 32);
m256 v;
DEBUG_PRINTF("d %zu\n", d - buf);
if (l < 4) {
u8 *vp = (u8*)&v;
switch (l) {
case 3: vp[2] = d[2]; // fallthrough
case 2: vp[1] = d[1]; // fallthrough
case 1: vp[0] = d[0]; // fallthrough
}
} else {
v = masked_move256_len(d, l);
}
if (noCase) {
v = and256(v, caseMask);
}
u32 z0 = movemask256(eq256(mask1, v));
u32 z1 = movemask256(eq256(mask2, v));
u32 z = (z0 << 1) & z1;
// mask out where we can't match
u32 mask = (0xFFFFFFFF >> (32 - l));
z &= mask;
DOUBLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanSingleFast(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m256 caseMask, m256 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start, *e = buf + end;
assert(d < e);
for (; d < e; d += 32) {
m256 v = noCase ? and256(load256(d), caseMask) : load256(d);
u32 z = movemask256(eq256(mask1, v));
// On large packet buffers, this prefetch appears to get us about 2%.
__builtin_prefetch(d + 128);
SINGLE_ZSCAN();
}
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanDoubleFast(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m256 caseMask, m256 mask1,
m256 mask2, const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start, *e = buf + end;
DEBUG_PRINTF("start %zu end %zu \n", start, end);
assert(d < e);
u32 lastz0 = 0;
for (; d < e; d += 32) {
m256 v = noCase ? and256(load256(d), caseMask) : load256(d);
// we have to pull the masks out of the AVX registers because we can't
// byte shift between the lanes
u32 z0 = movemask256(eq256(mask1, v));
u32 z1 = movemask256(eq256(mask2, v));
u32 z = (lastz0 | (z0 << 1)) & z1;
lastz0 = z0 >> 31;
// On large packet buffers, this prefetch appears to get us about 2%.
__builtin_prefetch(d + 128);
DOUBLE_ZSCAN();
}
return HWLM_SUCCESS;
}

View File

@ -1,191 +0,0 @@
/*
* Copyright (c) 2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/* noodle scan parts for AVX512 */
static really_inline
m512 getMask(u8 c, bool noCase) {
u8 k = caseClear8(c, noCase);
return set64x8(k);
}
static really_inline
m512 getCaseMask(void) {
return set64x8(CASE_CLEAR);
}
// The short scan routine. It is used both to scan data up to an
// alignment boundary if needed and to finish off data that the aligned scan
// function can't handle (due to small/unaligned chunk at end)
static really_inline
hwlm_error_t scanSingleShort(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m512 caseMask, m512 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start;
ptrdiff_t scan_len = end - start;
DEBUG_PRINTF("scan_len %zu\n", scan_len);
assert(scan_len <= 64);
if (!scan_len) {
return HWLM_SUCCESS;
}
__mmask64 k = (~0ULL) >> (64 - scan_len);
DEBUG_PRINTF("load mask 0x%016llx\n", k);
m512 v = loadu_maskz_m512(k, d);
if (noCase) {
v = and512(v, caseMask);
}
// reuse the load mask to indicate valid bytes
u64a z = masked_eq512mask(k, mask1, v);
SINGLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanSingle512(const struct noodTable *n, const u8 *buf, size_t len,
bool noCase, m512 caseMask, m512 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start;
const u8 *e = buf + end;
DEBUG_PRINTF("start %p end %p \n", d, e);
assert(d < e);
if (d + 64 >= e) {
goto tail;
}
// peel off first part to cacheline boundary
const u8 *d1 = ROUNDUP_PTR(d, 64);
if (scanSingleShort(n, buf, len, noCase, caseMask, mask1, cbi, start,
d1 - buf) == HWLM_TERMINATED) {
return HWLM_TERMINATED;
}
d = d1;
for (; d + 64 < e; d += 64) {
DEBUG_PRINTF("d %p e %p \n", d, e);
m512 v = noCase ? and512(load512(d), caseMask) : load512(d);
u64a z = eq512mask(mask1, v);
__builtin_prefetch(d + 128);
SINGLE_ZSCAN();
}
tail:
DEBUG_PRINTF("d %p e %p \n", d, e);
// finish off tail
return scanSingleShort(n, buf, len, noCase, caseMask, mask1, cbi, d - buf,
e - buf);
}
static really_inline
hwlm_error_t scanDoubleShort(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m512 caseMask, m512 mask1,
m512 mask2, const struct cb_info *cbi,
u64a *lastz0, size_t start, size_t end) {
DEBUG_PRINTF("start %zu end %zu last 0x%016llx\n", start, end, *lastz0);
const u8 *d = buf + start;
ptrdiff_t scan_len = end - start;
if (!scan_len) {
return HWLM_SUCCESS;
}
assert(scan_len <= 64);
__mmask64 k = (~0ULL) >> (64 - scan_len);
DEBUG_PRINTF("load mask 0x%016llx scan_len %zu\n", k, scan_len);
m512 v = loadu_maskz_m512(k, d);
if (noCase) {
v = and512(v, caseMask);
}
u64a z0 = masked_eq512mask(k, mask1, v);
u64a z1 = masked_eq512mask(k, mask2, v);
u64a z = (*lastz0 | (z0 << 1)) & z1;
DEBUG_PRINTF("z 0x%016llx\n", z);
DOUBLE_ZSCAN();
*lastz0 = z0 >> (scan_len - 1);
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanDouble512(const struct noodTable *n, const u8 *buf, size_t len,
bool noCase, m512 caseMask, m512 mask1, m512 mask2,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start;
const u8 *e = buf + end;
u64a lastz0 = 0;
DEBUG_PRINTF("start %zu end %zu \n", start, end);
assert(d < e);
if (d + 64 >= e) {
goto tail;
}
// peel off first part to cacheline boundary
const u8 *d1 = ROUNDUP_PTR(d, 64);
if (scanDoubleShort(n, buf, len, noCase, caseMask, mask1, mask2, cbi,
&lastz0, start, d1 - buf) == HWLM_TERMINATED) {
return HWLM_TERMINATED;
}
d = d1;
for (; d + 64 < e; d += 64) {
DEBUG_PRINTF("d %p e %p 0x%016llx\n", d, e, lastz0);
m512 v = noCase ? and512(load512(d), caseMask) : load512(d);
/* we have to pull the masks out of the AVX registers because we can't
byte shift between the lanes */
u64a z0 = eq512mask(mask1, v);
u64a z1 = eq512mask(mask2, v);
u64a z = (lastz0 | (z0 << 1)) & z1;
lastz0 = z0 >> 63;
// On large packet buffers, this prefetch appears to get us about 2%.
__builtin_prefetch(d + 256);
DEBUG_PRINTF("z 0x%016llx\n", z);
DOUBLE_ZSCAN();
}
tail:
DEBUG_PRINTF("d %p e %p off %zu \n", d, e, d - buf);
// finish off tail
return scanDoubleShort(n, buf, len, noCase, caseMask, mask1, mask2, cbi,
&lastz0, d - buf, end);
}

View File

@ -0,0 +1,310 @@
/*
* Copyright (c) 2017, Intel Corporation
* Copyright (c) 2020-2021, VectorCamp PC
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/* SIMD engine agnostic noodle scan parts */
#include "util/supervector/supervector.hpp"
#include "util/supervector/casemask.hpp"
static really_really_inline
hwlm_error_t single_zscan(const struct noodTable *n,const u8 *d, const u8 *buf,
Z_TYPE z, size_t len, const struct cb_info *cbi) {
while (unlikely(z)) {
Z_TYPE pos = JOIN(findAndClearLSB_, Z_BITS)(&z) >> Z_POSSHIFT;
size_t matchPos = d - buf + pos;
DEBUG_PRINTF("match pos %zu\n", matchPos);
hwlmcb_rv_t rv = final(n, buf, len, n->msk_len != 1, cbi, matchPos);
RETURN_IF_TERMINATED(rv);
}
return HWLM_SUCCESS;
}
static really_really_inline
hwlm_error_t double_zscan(const struct noodTable *n,const u8 *d, const u8 *buf,
Z_TYPE z, size_t len, const struct cb_info *cbi) {
while (unlikely(z)) {
Z_TYPE pos = JOIN(findAndClearLSB_, Z_BITS)(&z) >> Z_POSSHIFT;
size_t matchPos = d - buf + pos - 1;
DEBUG_PRINTF("match pos %zu\n", matchPos);
hwlmcb_rv_t rv = final(n, buf, len, true, cbi, matchPos);
RETURN_IF_TERMINATED(rv);
}
return HWLM_SUCCESS;
}
template<uint16_t S>
static really_inline
hwlm_error_t scanSingleShort(const struct noodTable *n, const u8 *buf,
SuperVector<S> caseMask, SuperVector<S> mask1,
const struct cb_info *cbi, size_t len, size_t start,
size_t end) {
const u8 *d = buf + start;
DEBUG_PRINTF("start %zu end %zu\n", start, end);
const size_t l = end - start;
DEBUG_PRINTF("l = %ld\n", l);
//assert(l <= 64);
if (!l) {
return HWLM_SUCCESS;
}
SuperVector<S> v = SuperVector<S>::Zeroes();
memcpy(&v.u, d, l);
typename SuperVector<S>::comparemask_type mask =
SINGLE_LOAD_MASK(l * SuperVector<S>::mask_width());
v = v & caseMask;
typename SuperVector<S>::comparemask_type z = mask & mask1.eqmask(v);
z = SuperVector<S>::iteration_mask(z);
return single_zscan(n, d, buf, z, len, cbi);
}
// The short scan routine. It is used both to scan data up to an
// alignment boundary if needed and to finish off data that the aligned scan
// function can't handle (due to small/unaligned chunk at end)
template<uint16_t S>
static really_inline
hwlm_error_t scanSingleUnaligned(const struct noodTable *n, const u8 *buf,
SuperVector<S> caseMask, SuperVector<S> mask1,
const struct cb_info *cbi, size_t len, size_t offset,
size_t start,
size_t end) {
const u8 *d = buf + offset;
DEBUG_PRINTF("start %zu end %zu offset %zu\n", start, end, offset);
const size_t l = end - start;
DEBUG_PRINTF("l = %ld\n", l);
assert(l <= 64);
if (!l) {
return HWLM_SUCCESS;
}
size_t buf_off = start - offset;
typename SuperVector<S>::comparemask_type mask =
SINGLE_LOAD_MASK(l * SuperVector<S>::mask_width())
<< (buf_off * SuperVector<S>::mask_width());
SuperVector<S> v = SuperVector<S>::loadu(d) & caseMask;
typename SuperVector<S>::comparemask_type z = mask & mask1.eqmask(v);
z = SuperVector<S>::iteration_mask(z);
return single_zscan(n, d, buf, z, len, cbi);
}
template<uint16_t S>
static really_inline
hwlm_error_t scanDoubleShort(const struct noodTable *n, const u8 *buf,
SuperVector<S> caseMask, SuperVector<S> mask1, SuperVector<S> mask2,
const struct cb_info *cbi, size_t len, size_t start, size_t end) {
const u8 *d = buf + start;
DEBUG_PRINTF("start %zu end %zu\n", start, end);
const size_t l = end - start;
assert(l <= S);
if (!l) {
return HWLM_SUCCESS;
}
SuperVector<S> v = SuperVector<S>::Zeroes();
memcpy(&v.u, d, l);
v = v & caseMask;
typename SuperVector<S>::comparemask_type mask =
DOUBLE_LOAD_MASK(l * SuperVector<S>::mask_width());
typename SuperVector<S>::comparemask_type z1 = mask1.eqmask(v);
typename SuperVector<S>::comparemask_type z2 = mask2.eqmask(v);
typename SuperVector<S>::comparemask_type z =
mask & (z1 << (SuperVector<S>::mask_width())) & z2;
z = SuperVector<S>::iteration_mask(z);
return double_zscan(n, d, buf, z, len, cbi);
}
template<uint16_t S>
static really_inline
hwlm_error_t scanDoubleUnaligned(const struct noodTable *n, const u8 *buf,
SuperVector<S> caseMask, SuperVector<S> mask1, SuperVector<S> mask2,
const struct cb_info *cbi, size_t len, size_t offset, size_t start, size_t end) {
const u8 *d = buf + offset;
DEBUG_PRINTF("start %zu end %zu offset %zu\n", start, end, offset);
const size_t l = end - start;
assert(l <= S);
if (!l) {
return HWLM_SUCCESS;
}
SuperVector<S> v = SuperVector<S>::loadu(d) & caseMask;
size_t buf_off = start - offset;
typename SuperVector<S>::comparemask_type mask =
DOUBLE_LOAD_MASK(l * SuperVector<S>::mask_width())
<< (buf_off * SuperVector<S>::mask_width());
typename SuperVector<S>::comparemask_type z1 = mask1.eqmask(v);
typename SuperVector<S>::comparemask_type z2 = mask2.eqmask(v);
typename SuperVector<S>::comparemask_type z =
mask & (z1 << SuperVector<S>::mask_width()) & z2;
z = SuperVector<S>::iteration_mask(z);
return double_zscan(n, d, buf, z, len, cbi);
}
template <uint16_t S>
static really_inline
hwlm_error_t scanSingleMain(const struct noodTable *n, const u8 *buf,
size_t len, size_t offset,
SuperVector<S> caseMask, SuperVector<S> mask1,
const struct cb_info *cbi) {
size_t start = offset + n->msk_len - 1;
size_t end = len;
const u8 *d = buf + start;
const u8 *e = buf + end;
DEBUG_PRINTF("start %p end %p \n", d, e);
assert(d < e);
if (e - d < S) {
return scanSingleShort(n, buf, caseMask, mask1, cbi, len, start, end);
}
if (d + S <= e) {
// peel off first part to cacheline boundary
const u8 *d1 = ROUNDUP_PTR(d, S);
DEBUG_PRINTF("until aligned %p \n", d1);
if (scanSingleUnaligned(n, buf, caseMask, mask1, cbi, len, start, start, d1 - buf) == HWLM_TERMINATED) {
return HWLM_TERMINATED;
}
d = d1;
size_t loops = (end - (d - buf)) / S;
DEBUG_PRINTF("loops %ld \n", loops);
for (size_t i = 0; i < loops; i++, d+= S) {
DEBUG_PRINTF("d %p \n", d);
const u8 *base = ROUNDUP_PTR(d, 64);
// On large packet buffers, this prefetch appears to get us about 2%.
__builtin_prefetch(base + 256);
SuperVector<S> v = SuperVector<S>::load(d) & caseMask;
typename SuperVector<S>::comparemask_type z = mask1.eqmask(v);
z = SuperVector<S>::iteration_mask(z);
hwlm_error_t rv = single_zscan(n, d, buf, z, len, cbi);
RETURN_IF_TERMINATED(rv);
}
}
DEBUG_PRINTF("d %p e %p \n", d, e);
// finish off tail
size_t s2End = ROUNDDOWN_PTR(e, S) - buf;
if (s2End == end) {
return HWLM_SUCCESS;
}
return scanSingleUnaligned(n, buf, caseMask, mask1, cbi, len, end - S, s2End, len);
}
template <uint16_t S>
static really_inline
hwlm_error_t scanDoubleMain(const struct noodTable *n, const u8 *buf,
size_t len, size_t offset,
SuperVector<S> caseMask, SuperVector<S> mask1, SuperVector<S> mask2,
const struct cb_info *cbi) {
// we stop scanning for the key-fragment when the rest of the key can't
// possibly fit in the remaining buffer
size_t end = len - n->key_offset + 2;
size_t start = offset + n->msk_len - n->key_offset;
typename SuperVector<S>::comparemask_type lastz1{0};
const u8 *d = buf + start;
const u8 *e = buf + end;
DEBUG_PRINTF("start %p end %p \n", d, e);
assert(d < e);
if (e - d < S) {
return scanDoubleShort(n, buf, caseMask, mask1, mask2, cbi, len, d - buf, end);
}
if (d + S <= e) {
// peel off first part to cacheline boundary
const u8 *d1 = ROUNDUP_PTR(d, S) + 1;
DEBUG_PRINTF("until aligned %p \n", d1);
if (scanDoubleUnaligned(n, buf, caseMask, mask1, mask2, cbi, len, start, start, d1 - buf) == HWLM_TERMINATED) {
return HWLM_TERMINATED;
}
d = d1 - 1;
size_t loops = (end - (d - buf)) / S;
DEBUG_PRINTF("loops %ld \n", loops);
for (size_t i = 0; i < loops; i++, d+= S) {
DEBUG_PRINTF("d %p \n", d);
const u8 *base = ROUNDUP_PTR(d, 64);
// On large packet buffers, this prefetch appears to get us about 2%.
__builtin_prefetch(base + 256);
SuperVector<S> v = SuperVector<S>::load(d) & caseMask;
typename SuperVector<S>::comparemask_type z1 = mask1.eqmask(v);
typename SuperVector<S>::comparemask_type z2 = mask2.eqmask(v);
typename SuperVector<S>::comparemask_type z =
(z1 << SuperVector<S>::mask_width() | lastz1) & z2;
lastz1 = z1 >> (Z_SHIFT * SuperVector<S>::mask_width());
z = SuperVector<S>::iteration_mask(z);
hwlm_error_t rv = double_zscan(n, d, buf, z, len, cbi);
RETURN_IF_TERMINATED(rv);
}
if (loops == 0) {
d = d1;
}
}
// finish off tail
size_t s2End = ROUNDDOWN_PTR(e, S) - buf;
if (s2End == end) {
return HWLM_SUCCESS;
}
return scanDoubleUnaligned(n, buf, caseMask, mask1, mask2, cbi, len, end - S, d - buf, end);
}
// Single-character specialisation, used when keyLen = 1
static really_inline
hwlm_error_t scanSingle(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, bool noCase, const struct cb_info *cbi) {
if (!ourisalpha(n->key0)) {
noCase = 0; // force noCase off if we don't have an alphabetic char
}
const SuperVector<VECTORSIZE> caseMask{noCase ? getCaseMask<VECTORSIZE>() : SuperVector<VECTORSIZE>::Ones()};
const SuperVector<VECTORSIZE> mask1{getMask<VECTORSIZE>(n->key0, noCase)};
return scanSingleMain(n, buf, len, start, caseMask, mask1, cbi);
}
static really_inline
hwlm_error_t scanDouble(const struct noodTable *n, const u8 *buf, size_t len,
size_t start, bool noCase, const struct cb_info *cbi) {
const SuperVector<VECTORSIZE> caseMask{noCase ? getCaseMask<VECTORSIZE>() : SuperVector<VECTORSIZE>::Ones()};
const SuperVector<VECTORSIZE> mask1{getMask<VECTORSIZE>(n->key0, noCase)};
const SuperVector<VECTORSIZE> mask2{getMask<VECTORSIZE>(n->key1, noCase)};
return scanDoubleMain(n, buf, len, start, caseMask, mask1, mask2, cbi);
}

View File

@ -1,203 +0,0 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/* noodle scan parts for SSE */
static really_inline m128 getMask(u8 c, bool noCase) {
u8 k = caseClear8(c, noCase);
return set16x8(k);
}
static really_inline m128 getCaseMask(void) {
return set16x8(0xdf);
}
static really_inline
hwlm_error_t scanSingleShort(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m128 caseMask, m128 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start;
size_t l = end - start;
DEBUG_PRINTF("l %zu\n", l);
assert(l <= 16);
if (!l) {
return HWLM_SUCCESS;
}
m128 v = zeroes128();
// we don't have a clever way of doing this move yet
memcpy(&v, d, l);
if (noCase) {
v = and128(v, caseMask);
}
// mask out where we can't match
u32 mask = (0xFFFF >> (16 - l));
u32 z = mask & movemask128(eq128(mask1, v));
SINGLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanSingleUnaligned(const struct noodTable *n, const u8 *buf,
size_t len, size_t offset, bool noCase,
m128 caseMask, m128 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + offset;
DEBUG_PRINTF("start %zu end %zu offset %zu\n", start, end, offset);
const size_t l = end - start;
m128 v = loadu128(d);
if (noCase) {
v = and128(v, caseMask);
}
u32 buf_off = start - offset;
u32 mask = ((1 << l) - 1) << buf_off;
u32 z = mask & movemask128(eq128(mask1, v));
DEBUG_PRINTF("mask 0x%08x z 0x%08x\n", mask, z);
z &= mask;
SINGLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanDoubleShort(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m128 caseMask, m128 mask1,
m128 mask2, const struct cb_info *cbi,
size_t start, size_t end) {
const u8 *d = buf + start;
size_t l = end - start;
if (!l) {
return HWLM_SUCCESS;
}
assert(l <= 32);
DEBUG_PRINTF("d %zu\n", d - buf);
m128 v = zeroes128();
memcpy(&v, d, l);
if (noCase) {
v = and128(v, caseMask);
}
u32 z = movemask128(and128(lshiftbyte_m128(eq128(mask1, v), 1),
eq128(mask2, v)));
// mask out where we can't match
u32 mask = (0xFFFF >> (16 - l));
z &= mask;
DOUBLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanDoubleUnaligned(const struct noodTable *n, const u8 *buf,
size_t len, size_t offset, bool noCase,
m128 caseMask, m128 mask1, m128 mask2,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + offset;
DEBUG_PRINTF("start %zu end %zu offset %zu\n", start, end, offset);
size_t l = end - start;
m128 v = loadu128(d);
if (noCase) {
v = and128(v, caseMask);
}
u32 z = movemask128(and128(lshiftbyte_m128(eq128(mask1, v), 1),
eq128(mask2, v)));
// mask out where we can't match
u32 buf_off = start - offset;
u32 mask = ((1 << l) - 1) << buf_off;
DEBUG_PRINTF("mask 0x%08x z 0x%08x\n", mask, z);
z &= mask;
DOUBLE_ZSCAN();
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanSingleFast(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m128 caseMask, m128 mask1,
const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start, *e = buf + end;
assert(d < e);
for (; d < e; d += 16) {
m128 v = noCase ? and128(load128(d), caseMask) : load128(d);
u32 z = movemask128(eq128(mask1, v));
// On large packet buffers, this prefetch appears to get us about 2%.
__builtin_prefetch(d + 128);
SINGLE_ZSCAN();
}
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t scanDoubleFast(const struct noodTable *n, const u8 *buf,
size_t len, bool noCase, m128 caseMask, m128 mask1,
m128 mask2, const struct cb_info *cbi, size_t start,
size_t end) {
const u8 *d = buf + start, *e = buf + end;
assert(d < e);
m128 lastz1 = zeroes128();
for (; d < e; d += 16) {
m128 v = noCase ? and128(load128(d), caseMask) : load128(d);
m128 z1 = eq128(mask1, v);
m128 z2 = eq128(mask2, v);
u32 z = movemask128(and128(palignr(z1, lastz1, 15), z2));
lastz1 = z1;
// On large packet buffers, this prefetch appears to get us about 2%.
__builtin_prefetch(d + 128);
DEBUG_PRINTF("z 0x%08x\n", z);
DOUBLE_ZSCAN();
}
return HWLM_SUCCESS;
}

View File

@ -0,0 +1,259 @@
/*
* Copyright (c) 2021, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
static really_inline
hwlm_error_t checkMatched(const struct noodTable *n, const u8 *buf, size_t len,
const struct cb_info *cbi, const u8 *d,
svbool_t matched, bool needsConfirm) {
assert(d >= buf);
size_t basePos = d - buf;
svbool_t next_match = svpnext_b8(matched, svpfalse());
do {
svbool_t brk = svbrkb_z(svptrue_b8(), next_match);
size_t matchPos = basePos + svcntp_b8(svptrue_b8(), brk);
DEBUG_PRINTF("match pos %zu\n", matchPos);
assert(matchPos < len);
hwlmcb_rv_t rv = final(n, buf, len, needsConfirm, cbi, matchPos);
RETURN_IF_TERMINATED(rv);
next_match = svpnext_b8(matched, next_match);
} while (unlikely(svptest_any(svptrue_b8(), next_match)));
return HWLM_SUCCESS;
}
static really_inline
hwlm_error_t singleCheckMatched(const struct noodTable *n, const u8 *buf,
size_t len, const struct cb_info *cbi,
const u8 *d, svbool_t matched) {
if (unlikely(svptest_any(svptrue_b8(), matched))) {
hwlmcb_rv_t rv = checkMatched(n, buf, len, cbi, d, matched,
n->msk_len != 1);
RETURN_IF_TERMINATED(rv);
}
return HWLM_SUCCESS;
}
static really_inline
svbool_t singleMatched(svuint8_t chars, const u8 *d, svbool_t pg) {
return svmatch(pg, svld1_u8(pg, d), chars);
}
static really_inline
hwlm_error_t scanSingleOnce(const struct noodTable *n, const u8 *buf,
size_t len, const struct cb_info *cbi,
svuint8_t chars, const u8 *d, const u8 *e) {
DEBUG_PRINTF("start %p end %p\n", d, e);
assert(d < e);
assert(d >= buf);
DEBUG_PRINTF("l = %td\n", e - d);
svbool_t pg = svwhilelt_b8_s64(0, e - d);
svbool_t matched = singleMatched(chars, d, pg);
return singleCheckMatched(n, buf, len, cbi, d, matched);
}
static really_inline
hwlm_error_t scanSingleLoop(const struct noodTable *n, const u8 *buf,
size_t len, const struct cb_info *cbi,
svuint8_t chars, const u8 *d, const u8 *e) {
assert(d < e);
assert(d >= buf);
size_t loops = (e - d) / svcntb();
DEBUG_PRINTF("loops %zu \n", loops);
assert(d + (loops * svcntb()) <= e);
for (size_t i = 0; i < loops; i++, d += svcntb()) {
DEBUG_PRINTF("d %p \n", d);
svbool_t matched = singleMatched(chars, d, svptrue_b8());
hwlmcb_rv_t rv = singleCheckMatched(n, buf, len, cbi, d, matched);
RETURN_IF_TERMINATED(rv);
}
DEBUG_PRINTF("d %p e %p \n", d, e);
return d == e ? HWLM_SUCCESS
: scanSingleOnce(n, buf, len, cbi, chars, d, e);
}
static really_inline
hwlm_error_t scanSingle(const struct noodTable *n, const u8 *buf, size_t len,
size_t offset, bool noCase, const struct cb_info *cbi) {
if (!ourisalpha(n->key0)) {
noCase = false; // force noCase off if we don't have an alphabetic char
}
size_t start = offset + n->msk_len - 1;
const u8 *d = buf + start;
const u8 *e = buf + len;
DEBUG_PRINTF("start %p end %p \n", d, e);
assert(d < e);
assert(d >= buf);
svuint8_t chars = getCharMaskSingle(n->key0, noCase);
size_t scan_len = e - d;
if (scan_len <= svcntb()) {
return scanSingleOnce(n, buf, len, cbi, chars, d, e);
}
// peel off first part to align to the vector size
const u8 *d1 = ROUNDUP_PTR(d, svcntb_pat(SV_POW2));
if (d != d1) {
DEBUG_PRINTF("until aligned %p \n", d1);
hwlmcb_rv_t rv = scanSingleOnce(n, buf, len, cbi, chars, d, d1);
RETURN_IF_TERMINATED(rv);
}
return scanSingleLoop(n, buf, len, cbi, chars, d1, e);
}
static really_inline
hwlm_error_t doubleCheckMatched(const struct noodTable *n, const u8 *buf,
size_t len, const struct cb_info *cbi,
const u8 *d, svbool_t matched,
svbool_t matched_rot, svbool_t any) {
if (unlikely(svptest_any(svptrue_b8(), any))) {
// Project predicate onto vector.
svuint8_t matched_vec = svdup_u8_z(matched, 1);
// Shift vector to right by one and project back to the predicate.
matched = svcmpeq_n_u8(svptrue_b8(), svinsr_n_u8(matched_vec, 0), 1);
matched = svorr_z(svptrue_b8(), matched, matched_rot);
// d - 1 won't underflow as the first position in buf has been dealt
// with meaning that d > buf
assert(d > buf);
hwlmcb_rv_t rv = checkMatched(n, buf, len, cbi, d - 1, matched,
n->msk_len != 2);
RETURN_IF_TERMINATED(rv);
}
return HWLM_SUCCESS;
}
static really_inline
svbool_t doubleMatchedLoop(svuint16_t chars, const u8 *d,
svbool_t * const matched, svbool_t * const matched_rot) {
svuint16_t vec = svreinterpret_u16(svld1_u8(svptrue_b8(), d));
// d - 1 won't underflow as the first position in buf has been dealt
// with meaning that d > buf
svuint16_t vec_rot = svreinterpret_u16(svld1_u8(svptrue_b8(), d - 1));
*matched = svmatch(svptrue_b8(), vec, chars);
*matched_rot = svmatch(svptrue_b8(), vec_rot, chars);
return svorr_z(svptrue_b8(), *matched, *matched_rot);
}
static really_inline
hwlm_error_t scanDoubleOnce(const struct noodTable *n, const u8 *buf,
size_t len, const struct cb_info *cbi,
svuint8_t chars, const u8 *d, const u8 *e) {
DEBUG_PRINTF("start %p end %p\n", d, e);
assert(d < e);
assert(d > buf);
const ptrdiff_t size = e - d;
svbool_t pg = svwhilelt_b8_s64(0, size);
svbool_t pg_rot = svwhilelt_b8_s64(0, size + 1);
svuint16_t vec = svreinterpret_u16(svld1_u8(pg, d));
// d - 1 won't underflow as the first position in buf has been dealt
// with meaning that d > buf
svuint16_t vec_rot = svreinterpret_u16(svld1_u8(pg_rot, d - 1));
// we reuse u8 predicates for u16 lanes. This means that we will check against one
// extra \0 character at the end of the vector.
if(unlikely(n->key1 == '\0')) {
if (size % 2) {
// if odd, vec has an odd number of lanes and has the spurious \0
svbool_t lane_to_disable = svrev_b8(svpfirst(svrev_b8(pg), svpfalse()));
pg = sveor_z(svptrue_b8(), pg, lane_to_disable);
} else {
// if even, vec_rot has an odd number of lanes and has the spurious \0
// we need to disable the last active lane as well, but we know pg is
// the same as pg_rot without the last lane
pg_rot = pg;
}
}
svbool_t matched = svmatch(pg, vec, svreinterpret_u16(chars));
svbool_t matched_rot = svmatch(pg_rot, vec_rot, svreinterpret_u16(chars));
svbool_t any = svorr_z(svptrue_b8(), matched, matched_rot);
return doubleCheckMatched(n, buf, len, cbi, d, matched, matched_rot, any);
}
static really_inline
hwlm_error_t scanDoubleLoop(const struct noodTable *n, const u8 *buf,
size_t len, const struct cb_info *cbi,
svuint8_t chars, const u8 *d, const u8 *e) {
assert(d < e);
assert(d > buf);
size_t loops = (e - d) / svcntb();
DEBUG_PRINTF("loops %zu \n", loops);
assert(d + (loops * svcntb()) <= e);
for (size_t i = 0; i < loops; i++, d += svcntb()) {
DEBUG_PRINTF("d %p \n", d);
svbool_t matched, matched_rot;
svbool_t any = doubleMatchedLoop(svreinterpret_u16(chars), d, &matched, &matched_rot);
hwlm_error_t rv = doubleCheckMatched(n, buf, len, cbi, d,
matched, matched_rot, any);
RETURN_IF_TERMINATED(rv);
}
DEBUG_PRINTF("d %p e %p \n", d, e);
return d == e ? HWLM_SUCCESS
: scanDoubleOnce(n, buf, len, cbi, chars, d, e);
}
static really_inline
hwlm_error_t scanDouble(const struct noodTable *n, const u8 *buf, size_t len,
size_t offset, bool noCase, const struct cb_info *cbi) {
// we stop scanning for the key-fragment when the rest of the key can't
// possibly fit in the remaining buffer
size_t end = len - n->key_offset + 2;
size_t start = offset + n->msk_len - n->key_offset;
const u8 *d = buf + start;
const u8 *e = buf + end;
DEBUG_PRINTF("start %p end %p \n", d, e);
assert(d < e);
assert(d >= buf);
size_t scan_len = e - d;
if (scan_len < 2) {
return HWLM_SUCCESS;
}
++d;
svuint8_t chars = svreinterpret_u8(getCharMaskDouble(n->key0, n->key1, noCase));
if (scan_len <= svcntb()) {
return scanDoubleOnce(n, buf, len, cbi, chars, d, e);
}
// peel off first part to align to the vector size
const u8 *d1 = ROUNDUP_PTR(d, svcntb_pat(SV_POW2));
if (d != d1) {
DEBUG_PRINTF("until aligned %p \n", d1);
hwlmcb_rv_t rv = scanDoubleOnce(n, buf, len, cbi, chars,
d, d1);
RETURN_IF_TERMINATED(rv);
}
return scanDoubleLoop(n, buf, len, cbi, chars, d1, e);
}

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2021, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -29,7 +30,7 @@
#include "accel.h"
#include "shufti.h"
#include "truffle.h"
#include "vermicelli.h"
#include "vermicelli.hpp"
#include "ue2common.h"
const u8 *run_accel(const union AccelAux *accel, const u8 *c, const u8 *c_end) {
@ -81,6 +82,39 @@ const u8 *run_accel(const union AccelAux *accel, const u8 *c, const u8 *c_end) {
c_end - 1);
break;
#ifdef HAVE_SVE2
case ACCEL_VERM16:
DEBUG_PRINTF("accel verm16 %p %p\n", c, c_end);
if (c_end - c < 16) {
return c;
}
rv = vermicelli16Exec(accel->verm16.mask, c, c_end);
break;
case ACCEL_DVERM16:
DEBUG_PRINTF("accel dverm16 %p %p\n", c, c_end);
if (c_end - c < 18) {
return c;
}
/* need to stop one early to get an accurate end state */
rv = vermicelliDouble16Exec(accel->dverm16.mask, accel->dverm16.firsts,
c, c_end - 1);
break;
case ACCEL_DVERM16_MASKED:
DEBUG_PRINTF("accel dverm16 masked %p %p\n", c, c_end);
if (c_end - c < 18) {
return c;
}
/* need to stop one early to get an accurate end state */
rv = vermicelliDoubleMasked16Exec(accel->mdverm16.mask, accel->mdverm16.c1,
accel->mdverm16.m1, c, c_end - 1);
break;
#endif // HAVE_SVE2
case ACCEL_DVERM_MASKED:
DEBUG_PRINTF("accel dverm masked %p %p\n", c, c_end);
if (c + 16 + 1 >= c_end) {
@ -108,9 +142,18 @@ const u8 *run_accel(const union AccelAux *accel, const u8 *c, const u8 *c_end) {
return c;
}
rv = truffleExec(accel->truffle.mask1, accel->truffle.mask2, c, c_end);
rv = truffleExec(accel->truffle.mask_lo, accel->truffle.mask_hi, c, c_end);
break;
#ifdef CAN_USE_WIDE_TRUFFLE
case ACCEL_TRUFFLE_WIDE:
DEBUG_PRINTF("accel Truffle Wide %p %p\n", c, c_end);
if (c + 15 >= c_end) {
return c;
}
rv = truffleExecWide(accel->truffle.mask, c, c_end);
break;
#endif
case ACCEL_DSHUFTI:
DEBUG_PRINTF("accel dshufti %p %p\n", c, c_end);
if (c + 15 + 1 >= c_end) {

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2021, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -62,6 +63,10 @@ enum AccelType {
ACCEL_TRUFFLE,
ACCEL_RED_TAPE,
ACCEL_DVERM_MASKED,
ACCEL_VERM16,
ACCEL_DVERM16,
ACCEL_DVERM16_MASKED,
ACCEL_TRUFFLE_WIDE,
};
/** \brief Structure for accel framework. */
@ -97,6 +102,24 @@ union AccelAux {
u8 len1;
u8 len2;
} mdverm;
struct {
u8 accel_type;
u8 offset;
m128 mask;
} verm16;
struct {
u8 accel_type;
u8 offset;
u64a firsts;
m128 mask;
} dverm16;
struct {
u8 accel_type;
u8 offset;
u8 c1; // used for partial match
u8 m1; // used for partial match
m128 mask;
} mdverm16;
struct {
u8 accel_type;
u8 offset;
@ -114,8 +137,18 @@ union AccelAux {
struct {
u8 accel_type;
u8 offset;
m128 mask1;
m128 mask2;
union {
m256 mask;
struct {
#if (SIMDE_ENDIAN_ORDER == SIMDE_ENDIAN_LITTLE)
m128 mask_lo;
m128 mask_hi;
#else
m128 mask_hi;
m128 mask_lo;
#endif
};
};
} truffle;
};

View File

@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2017, Intel Corporation
* Copyright (c) 2021, Arm Limited
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
@ -33,6 +34,7 @@
#include "nfagraph/ng_limex_accel.h"
#include "shufticompile.h"
#include "trufflecompile.h"
#include "vermicellicompile.h"
#include "util/accel_scheme.h"
#include "util/charreach.h"
#include "util/container.h"
@ -105,7 +107,7 @@ static
path append(const path &orig, const CharReach &cr, u32 new_dest) {
path p(new_dest);
p.reach = orig.reach;
p.reach.push_back(cr);
p.reach.emplace_back(cr);
return p;
}
@ -117,25 +119,25 @@ void extend(const raw_dfa &rdfa, const vector<CharReach> &rev_map,
const dstate &s = rdfa.states[p.dest];
if (!p.reach.empty() && p.reach.back().none()) {
out.push_back(p);
out.emplace_back(p);
return;
}
if (!s.reports.empty()) {
if (generates_callbacks(rdfa.kind)) {
out.push_back(p);
out.emplace_back(p);
return;
} else {
path pp = append(p, CharReach(), p.dest);
all[p.dest].push_back(pp);
out.push_back(move(pp));
all[p.dest].emplace_back(pp);
out.emplace_back(std::move(pp));
}
}
if (!s.reports_eod.empty()) {
path pp = append(p, CharReach(), p.dest);
all[p.dest].push_back(pp);
out.push_back(move(pp));
all[p.dest].emplace_back(pp);
out.emplace_back(std::move(pp));
}
flat_map<u32, CharReach> dest;
@ -154,8 +156,8 @@ void extend(const raw_dfa &rdfa, const vector<CharReach> &rev_map,
DEBUG_PRINTF("----good: [%s] -> %u\n",
describeClasses(pp.reach).c_str(), pp.dest);
all[e.first].push_back(pp);
out.push_back(move(pp));
all[e.first].emplace_back(pp);
out.emplace_back(std::move(pp));
}
}
@ -165,14 +167,14 @@ vector<vector<CharReach>> generate_paths(const raw_dfa &rdfa,
const vector<CharReach> rev_map = reverse_alpha_remapping(rdfa);
vector<path> paths{path(base)};
unordered_map<u32, vector<path>> all;
all[base].push_back(path(base));
all[base].emplace_back(path(base));
for (u32 i = 0; i < len && paths.size() < PATHS_LIMIT; i++) {
vector<path> next_gen;
for (const auto &p : paths) {
extend(rdfa, rev_map, p, all, next_gen);
}
paths = move(next_gen);
paths = std::move(next_gen);
}
dump_paths(paths);
@ -180,7 +182,8 @@ vector<vector<CharReach>> generate_paths(const raw_dfa &rdfa,
vector<vector<CharReach>> rv;
rv.reserve(paths.size());
for (auto &p : paths) {
rv.push_back(vector<CharReach>(std::make_move_iterator(p.reach.begin()),
// cppcheck-suppress useStlAlgorithm
rv.emplace_back(vector<CharReach>(std::make_move_iterator(p.reach.begin()),
std::make_move_iterator(p.reach.end())));
}
return rv;
@ -318,7 +321,7 @@ set<dstate_id_t> find_region(const raw_dfa &rdfa, dstate_id_t base,
DEBUG_PRINTF(" %hu is in region\n", t);
region.insert(t);
pending.push_back(t);
pending.emplace_back(t);
}
}
@ -424,10 +427,11 @@ void
accel_dfa_build_strat::buildAccel(UNUSED dstate_id_t this_idx,
const AccelScheme &info,
void *accel_out) {
AccelAux *accel = (AccelAux *)accel_out;
AccelAux *accel = reinterpret_cast<AccelAux *>(accel_out);
DEBUG_PRINTF("accelerations scheme has offset s%u/d%u\n", info.offset,
info.double_offset);
// cppcheck-suppress redundantInitialization
accel->generic.offset = verify_u8(info.offset);
if (double_byte_ok(info) && info.double_cr.none() &&
@ -440,52 +444,87 @@ accel_dfa_build_strat::buildAccel(UNUSED dstate_id_t this_idx,
return;
}
if (double_byte_ok(info) && info.double_cr.none() &&
(info.double_byte.size() == 2 || info.double_byte.size() == 4)) {
bool ok = true;
if (double_byte_ok(info) && info.double_cr.none()) {
if ((info.double_byte.size() == 2 || info.double_byte.size() == 4)) {
bool ok = true;
assert(!info.double_byte.empty());
u8 firstC = info.double_byte.begin()->first & CASE_CLEAR;
u8 secondC = info.double_byte.begin()->second & CASE_CLEAR;
assert(!info.double_byte.empty());
u8 firstC = info.double_byte.begin()->first & CASE_CLEAR;
u8 secondC = info.double_byte.begin()->second & CASE_CLEAR;
for (const pair<u8, u8> &p : info.double_byte) {
if ((p.first & CASE_CLEAR) != firstC ||
(p.second & CASE_CLEAR) != secondC) {
ok = false;
break;
for (const pair<u8, u8> &p : info.double_byte) {
if ((p.first & CASE_CLEAR) != firstC ||
(p.second & CASE_CLEAR) != secondC) {
ok = false;
break;
}
}
if (ok) {
accel->accel_type = ACCEL_DVERM_NOCASE;
accel->dverm.c1 = firstC;
accel->dverm.c2 = secondC;
accel->dverm.offset = verify_u8(info.double_offset);
DEBUG_PRINTF("state %hu is nc double vermicelli\n", this_idx);
return;
}
u8 m1;
u8 m2;
if (buildDvermMask(info.double_byte, &m1, &m2)) {
u8 c1 = info.double_byte.begin()->first & m1;
u8 c2 = info.double_byte.begin()->second & m2;
#ifdef HAVE_SVE2
if (vermicelliDoubleMasked16Build(c1, c2, m1, m2,
reinterpret_cast<u8 *>(&accel->mdverm16.mask))) {
accel->accel_type = ACCEL_DVERM16_MASKED;
accel->mdverm16.offset = verify_u8(info.double_offset);
accel->mdverm16.c1 = c1;
accel->mdverm16.m1 = m1;
DEBUG_PRINTF("building maskeddouble16-vermicelli for 0x%02hhx%02hhx\n",
c1, c2);
return;
} else if (info.double_byte.size() <= 8 &&
vermicelliDouble16Build(info.double_byte,
reinterpret_cast<u8 *>(&accel->dverm16.mask),
reinterpret_cast<u8 *>(&accel->dverm16.firsts))) {
accel->accel_type = ACCEL_DVERM16;
accel->dverm16.offset = verify_u8(info.double_offset);
DEBUG_PRINTF("building double16-vermicelli\n");
return;
}
#endif // HAVE_SVE2
accel->accel_type = ACCEL_DVERM_MASKED;
accel->dverm.offset = verify_u8(info.double_offset);
accel->dverm.c1 = c1;
accel->dverm.c2 = c2;
accel->dverm.m1 = m1;
accel->dverm.m2 = m2;
DEBUG_PRINTF(
"building maskeddouble-vermicelli for 0x%02hhx%02hhx\n", c1, c2);
return;
}
}
if (ok) {
accel->accel_type = ACCEL_DVERM_NOCASE;
accel->dverm.c1 = firstC;
accel->dverm.c2 = secondC;
accel->dverm.offset = verify_u8(info.double_offset);
DEBUG_PRINTF("state %hu is nc double vermicelli\n", this_idx);
return;
}
u8 m1;
u8 m2;
if (buildDvermMask(info.double_byte, &m1, &m2)) {
accel->accel_type = ACCEL_DVERM_MASKED;
accel->dverm.offset = verify_u8(info.double_offset);
accel->dverm.c1 = info.double_byte.begin()->first & m1;
accel->dverm.c2 = info.double_byte.begin()->second & m2;
accel->dverm.m1 = m1;
accel->dverm.m2 = m2;
DEBUG_PRINTF(
"building maskeddouble-vermicelli for 0x%02hhx%02hhx\n",
accel->dverm.c1, accel->dverm.c2);
#ifdef HAVE_SVE2
if (info.double_byte.size() <= 8 &&
vermicelliDouble16Build(info.double_byte,
reinterpret_cast<u8 *>(&accel->dverm16.mask),
reinterpret_cast<u8 *>(&accel->dverm16.firsts))) {
accel->accel_type = ACCEL_DVERM16;
accel->dverm16.offset = verify_u8(info.double_offset);
DEBUG_PRINTF("building double16-vermicelli\n");
return;
}
#endif // HAVE_SVE2
}
if (double_byte_ok(info) &&
shuftiBuildDoubleMasks(
info.double_cr, info.double_byte, (u8 *)&accel->dshufti.lo1,
(u8 *)&accel->dshufti.hi1, (u8 *)&accel->dshufti.lo2,
(u8 *)&accel->dshufti.hi2)) {
info.double_cr, info.double_byte,
reinterpret_cast<u8 *>(&accel->dshufti.lo1),
reinterpret_cast<u8 *>(&accel->dshufti.hi1),
reinterpret_cast<u8 *>(&accel->dshufti.lo2),
reinterpret_cast<u8 *>(&accel->dshufti.hi2))) {
accel->accel_type = ACCEL_DSHUFTI;
accel->dshufti.offset = verify_u8(info.double_offset);
DEBUG_PRINTF("state %hu is double shufti\n", this_idx);
@ -514,6 +553,15 @@ accel_dfa_build_strat::buildAccel(UNUSED dstate_id_t this_idx,
return;
}
#ifdef HAVE_SVE2
if (info.cr.count() <= 16) {
accel->accel_type = ACCEL_VERM16;
vermicelli16Build(info.cr, reinterpret_cast<u8 *>(&accel->verm16.mask));
DEBUG_PRINTF("state %hu is vermicelli16\n", this_idx);
return;
}
#endif // HAVE_SVE2
if (info.cr.count() > max_floating_stop_char()) {
accel->accel_type = ACCEL_NONE;
DEBUG_PRINTF("state %hu is too broad\n", this_idx);
@ -521,16 +569,27 @@ accel_dfa_build_strat::buildAccel(UNUSED dstate_id_t this_idx,
}
accel->accel_type = ACCEL_SHUFTI;
if (-1 != shuftiBuildMasks(info.cr, (u8 *)&accel->shufti.lo,
(u8 *)&accel->shufti.hi)) {
if (-1 != shuftiBuildMasks(info.cr,
reinterpret_cast<u8 *>(&accel->shufti.lo),
reinterpret_cast<u8 *>(&accel->shufti.hi))) {
DEBUG_PRINTF("state %hu is shufti\n", this_idx);
return;
}
assert(!info.cr.none());
accel->accel_type = ACCEL_TRUFFLE;
truffleBuildMasks(info.cr, (u8 *)&accel->truffle.mask1,
(u8 *)&accel->truffle.mask2);
#if defined(CAN_USE_WIDE_TRUFFLE)
if(CAN_USE_WIDE_TRUFFLE) {
accel->accel_type = ACCEL_TRUFFLE_WIDE;
truffleBuildMasksWide(info.cr,
reinterpret_cast<u8 *>(&accel->truffle.mask));
} else
#endif
{
accel->accel_type = ACCEL_TRUFFLE;
truffleBuildMasks(info.cr,
reinterpret_cast<u8 *>(&accel->truffle.mask_lo),
reinterpret_cast<u8 *>(&accel->truffle.mask_hi));
}
DEBUG_PRINTF("state %hu is truffle\n", this_idx);
}

View File

@ -93,6 +93,8 @@ const char *accelName(u8 accel_type) {
return "double-shufti";
case ACCEL_TRUFFLE:
return "truffle";
case ACCEL_TRUFFLE_WIDE:
return "truffle wide";
case ACCEL_RED_TAPE:
return "red tape";
default:
@ -178,6 +180,13 @@ void dumpTruffleCharReach(FILE *f, const u8 *hiset, const u8 *hiclear) {
describeClass(cr).c_str());
}
static
void dumpWideTruffleCharReach(FILE *f, const u8 *mask) {
CharReach cr = truffle2crWide(mask);
fprintf(f, "count %zu class %s\n", cr.count(),
describeClass(cr).c_str());
}
static
void dumpTruffleMasks(FILE *f, const u8 *hiset, const u8 *hiclear) {
fprintf(f, "lo %s\n", dumpMask(hiset, 128).c_str());
@ -210,31 +219,38 @@ void dumpAccelInfo(FILE *f, const AccelAux &accel) {
break;
case ACCEL_SHUFTI: {
fprintf(f, "\n");
dumpShuftiMasks(f, (const u8 *)&accel.shufti.lo,
(const u8 *)&accel.shufti.hi);
dumpShuftiCharReach(f, (const u8 *)&accel.shufti.lo,
(const u8 *)&accel.shufti.hi);
dumpShuftiMasks(f, reinterpret_cast<const u8 *>(&accel.shufti.lo),
reinterpret_cast<const u8 *>(&accel.shufti.hi));
dumpShuftiCharReach(f, reinterpret_cast<const u8 *>(&accel.shufti.lo),
reinterpret_cast<const u8 *>(&accel.shufti.hi));
break;
}
case ACCEL_DSHUFTI:
fprintf(f, "\n");
fprintf(f, "mask 1\n");
dumpShuftiMasks(f, (const u8 *)&accel.dshufti.lo1,
(const u8 *)&accel.dshufti.hi1);
dumpShuftiMasks(f, reinterpret_cast<const u8 *>(&accel.dshufti.lo1),
reinterpret_cast<const u8 *>(&accel.dshufti.hi1));
fprintf(f, "mask 2\n");
dumpShuftiMasks(f, (const u8 *)&accel.dshufti.lo2,
(const u8 *)&accel.dshufti.hi2);
dumpDShuftiCharReach(f, (const u8 *)&accel.dshufti.lo1,
(const u8 *)&accel.dshufti.hi1,
(const u8 *)&accel.dshufti.lo2,
(const u8 *)&accel.dshufti.hi2);
dumpShuftiMasks(f, reinterpret_cast<const u8 *>(&accel.dshufti.lo2),
reinterpret_cast<const u8 *>(&accel.dshufti.hi2));
dumpDShuftiCharReach(f, reinterpret_cast<const u8 *>(&accel.dshufti.lo1),
reinterpret_cast<const u8 *>(&accel.dshufti.hi1),
reinterpret_cast<const u8 *>(&accel.dshufti.lo2),
reinterpret_cast<const u8 *>(&accel.dshufti.hi2));
break;
case ACCEL_TRUFFLE: {
fprintf(f, "\n");
dumpTruffleMasks(f, (const u8 *)&accel.truffle.mask1,
(const u8 *)&accel.truffle.mask2);
dumpTruffleCharReach(f, (const u8 *)&accel.truffle.mask1,
(const u8 *)&accel.truffle.mask2);
dumpTruffleMasks(f, reinterpret_cast<const u8 *>(&accel.truffle.mask_lo),
reinterpret_cast<const u8 *>(&accel.truffle.mask_hi));
dumpTruffleCharReach(f, reinterpret_cast<const u8 *>(&accel.truffle.mask_lo),
reinterpret_cast<const u8 *>(&accel.truffle.mask_hi));
break;
}
case ACCEL_TRUFFLE_WIDE: {
fprintf(f, "\n");
dumpTruffleMasks(f, reinterpret_cast<const u8 *>(&accel.truffle.mask_lo),
reinterpret_cast<const u8 *>(&accel.truffle.mask_hi));
dumpWideTruffleCharReach(f, reinterpret_cast<const u8 *>(&accel.truffle.mask));
break;
}
default:

Some files were not shown because too many files have changed in this diff Show More