ypicchi-arm
5145b6d2ab
Fix noodle sve2 off by one ( #313 )
...
* Revert "Fix noodle SVE2 off by one bug"
This patch was fixing the bug when it happens at the end of the buffer
but it wasn't fixing it when we do scanDoubleOnce before the main loop
The next patch fix this bug for both case instead
This reverts commit 48dd0e5ff0bc1995d62461c92cfb76d44d1d0105.
* Fix noodle spurious match with \0 chars for SVE2
When sve2's noodle process a non full vector (before the main loop or
at the end of it), a fake \0 was being parsed, trigerring a match for
pattern that ended with \0. This patch fix this.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
---------
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-08-29 13:49:29 +03:00
ypicchi-arm
aa4bc24439
Fix noodle SVE2 off by one bug ( #309 )
...
By using svmatch on 16 bit lanes with a 8 bit predicate, we end up
including an undefined character in the pattern checks. The inactive
lane after load contains an undefined value, usually \0. Patterns
using \0 as the last character would then match this spurious
character, returning a match beyond the buffer's end. The fix checks
for such matches and rejects them.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-08-05 09:42:56 +03:00
Konstantinos Margaritis
c837925087
Fix/Suppress remaining Cppcheck warnings ( #291 )
...
Fix/suppress the following cppcheck warnings:
* arithOperationsOnVoidPointer
* uninitMember
* const*
* shadowVariable
* assignmentIntegerToAddress
* containerOutOfBounds
* pointer-related warnings in Ragel source
* missingOverride
* memleak
* knownConditionTrueFalse
* noExplicitConstructor
* invalidPrintfArgType_sint
* useStlAlgorithm
* cstyleCast
* clarifyCondition
* VSX-related cstyleCast
* unsignedLessThanZero
Furthermore, we added a suppression list to be used, which also includes the following:
* missingIncludeSystem
* missingInclude
* unmatchedSuppression
2024-05-27 12:23:02 +03:00
Konstantinos Margaritis
cebc6541c1
Part 5 of C-style cast cppcheck ( #289 )
...
Fixes some cstyleCasts part 5
closes some: #252
2024-05-24 23:24:58 +03:00
Yoan Picchi
938c026256
Speed up truffle with 256b TBL instructions
...
256b wide SVE vectors allow some simplification of truffle.
Up to 40% speedup on graviton3. Going from 12500 MB/s to 17000 MB/s
onhe microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE
Add unit tests and benchmark for this wide variant
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-05-22 16:13:53 +00:00
gtsoul-tech
94eff4aa60
cstylecasts and suppressions
2024-05-22 10:16:56 +03:00
gtsoul-tech
e111684bc2
fix cStyleCasts
2024-05-20 14:54:35 +03:00
Konstantinos Margaritis
e819cb1100
Fix C-style casts
2024-05-16 12:03:42 +03:00
Konstantinos Margaritis
22166ed948
Fix remaining marked as done const* cppcheck warnings
2024-05-15 10:52:31 +03:00
gtsoul-tech
94b17ecaf2
noExplicitConstructor
2024-05-10 10:07:47 +03:00
Konstantinos Margaritis
0d2f9ccbaa
Fix 'unqualified call to std::move' errors in clang 15+
2023-10-03 20:24:39 +03:00
Danila Kutenin
49eb18ee4f
Optimize vectorscan for aarch64 by using shrn instruction
...
This optimization is based on the thread
https://twitter.com/Danlark1/status/1539344279268691970 and uses
shift right and narrow by 4 instruction https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/SHRN--SHRN2--Shift-Right-Narrow--immediate--
To achieve that, I needed to redesign a little movemask into comparemask
and have an additional step towards mask iteration. Our benchmarks
showed 10-15% improvement on average for long matches.
2022-06-26 22:55:45 +00:00
Danila Kutenin
9af996b936
Fix all ASAN issues in vectorscan
2022-02-18 17:14:51 +00:00
Konstantinos Margaritis
81fba99f3a
fix SVE2 build after the changes
2021-11-25 18:48:24 +02:00
Konstantinos Margaritis
210295a702
remove vermicelli.h and replace it with vermicelli.hpp
2021-11-02 22:30:53 +02:00
Konstantinos Margaritis
bc1a1127cf
add new include file
2021-11-01 16:28:50 +00:00
Konstantinos Margaritis
713aaef799
move casemask helper functions to separate header
2021-11-01 16:05:43 +00:00
Konstantinos Margaritis
4e044d4142
Add missing copyright info from tampered files
2021-10-12 11:51:35 +03:00
George Wort
df926ef62f
Implement new Vermicelli16 acceleration functions using SVE2.
...
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.
Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
c7086cb7f1
Add SVE2 support for dvermicelli
...
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e35b88f2c8
use STL make_unique, remove wrapper header, breaks C++17 compilation
2021-10-12 11:51:34 +03:00
George Wort
9fb79ac3ec
Add SVE2 support for vermicelli
...
Change-Id: Ia025de53521fbaefe5fb1e4425aaf75c7d80a14e
2021-10-12 11:51:34 +03:00
George Wort
7162446358
Remove possibly undefined behaviour from Noodle.
...
Change-Id: I9a7997cea6a48927cb02b00c5dba5009bbf83850
2021-10-12 11:51:34 +03:00
George Wort
b48ea2c1a6
Remove first check from scanDouble Noodle.
...
Change-Id: I00eabb3cb06ef6a2060df52c26fa8591907a2711
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
41ff0962c4
minor fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2753dbb3b0
rename supervector class header, use dup_*() functions names instead of set1_*(), minor fixes
2021-10-12 11:51:34 +03:00
George Wort
d1009e8830
Fix error in initial noodle double final call.
...
Change-Id: Ie044988f183b47e0b2f1eed3b4bd23de75c3117d
2021-10-12 11:51:34 +03:00
George Wort
d6df8116a5
Add SVE2 support for noodle
...
Change-Id: Iacb7d1f164bdd0ba50e2e13d26fe548cf9b45a6a
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e215157a21
move definitions elsewhere
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
05c7c8e576
move SuperVector versions of noodleEngine scan functions to _simd.hpp file
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
c6406bebde
simplify scanSingleMain() and scanDoubleMain()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f77837130d
delete separate implementations
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
ede2b18564
add generic SIMD implementation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
7a9a2dd0dc
convert to C++
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
831091db9e
fix typo
2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
ec5531a6b1
minor optimizations
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
d3ff893871
prefetch works best when addresses are 64-byte aligned
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
58cface115
optimise case handling
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
e3e101b412
simplify and make scanSingle*()/scanDouble*() more uniform
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
2f13ad0674
optimize caseMask handling
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
27bd09454f
use correct function names for AVX512, fix build failure
2021-02-15 13:54:19 +02:00
Konstantinos Margaritis
9fd94e0062
use unaligned loads for short scans
2021-02-11 14:21:57 +02:00
Konstantinos Margaritis
d3e03ed88a
optimize case mask AND out of the loop
2021-02-10 13:29:45 +02:00
Konstantinos Margaritis
5333467249
fix names, use own intrinsic instead of explicit _mm* ones
2020-09-23 11:51:21 +03:00
Konstantinos Margaritis
8ed5f4ac75
fix include paths for masked_move
2020-09-18 12:55:57 +03:00
Wang Xiang W
f658c4e149
Noodle: avoid an extra convert instruction
...
fixes github issue #221
2020-05-25 13:46:42 +00:00
Hong, Yang A
23e5f06594
add new Literal API for pure literal expressions:
...
Design compile time api hs_compile_lit() and hs_compile_lit_multi()
to handle pure literal pattern sets. Corresponding option --literal-on
is added for hyperscan testing suites. Extended parameters and part of
flags are not supported for this api.
2019-08-13 14:51:38 +08:00
Hong, Yang A
f68723a606
literal matching: separate path for pure literal patterns
2019-01-21 09:59:22 +08:00
Justin Viiret
af519f3190
hwlm_build: default for HWLMProto::make_small
...
Silences Coverity warning.
2017-09-18 13:29:34 +10:00
Alex Coyte
41783fe912
more comments on hwlm/fdr's start parameter
2017-08-21 11:23:41 +10:00