* Add regression test for double shufti
It tests for false positive at vector edges.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
* Fix double shufti reporting false positives
Double shufti used to offset one vector, resulting in losing one character
at the end of every vector. This was replaced by a magic value indicating a
match. This meant that if the first char of a pattern fell on the last char of
a vector, double shufti would assume the second character is present and
report a match.
This patch fixes it by keeping the previous vector and feeding its data to the
new one when we shift it, preventing any loss of data.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
* vshl() will call the correct implementation
* implement missing vshr_512_imm(), simplifies caller x86 code
* Fix x86 case, use alignr instead
* it's the reverse, the avx512 alignr is incorrect, need to fix
* Make shufti's OR reduce size agnostic
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
* Fix test's array size
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
* Fix AVX2/AVX512 alignr implementations and unit tests
* Fix Power VSX alignr
---------
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
Co-authored-by: Konstantinos Margaritis <konstantinos@vectorcamp.gr>
Multiple AVX512VBMI-related fixes:
src/nfa/mcsheng_compile.cpp: No need for an assert here, impl_id can be set to 0
src/nfa/nfa_api_queue.h: Make sure this compiles on both C++ and C
src/nfagraph/ng_fuzzy.cpp: Fix compilation error when DEBUG_OUTPUT=on
src/runtime.c: Fix crash when data == NULL
unit/internal/sheng.cpp: Unit test has to enable AVX512VBMI manually as autodetection does not get trigger, this causes test to fail
src/fdr/teddy_fat.cpp: AVX512 loads need to be 64-bit aligned, caused a crash on clang-18
256b wide SVE vectors allow some simplification of truffle.
Up to 40% speedup on graviton3. Going from 12500 MB/s to 17000 MB/s
onhe microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE
Add unit tests and benchmark for this wide variant
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>