Konstantinos Margaritis
fad39b6058
optimize and simplify Shufti and Truffle to work with a single block method instead
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
9e6c1c30cf
remove asserts, as they are not needed
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
9ab18cf419
fix for new pshufb
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e7161fdfec
initial SSE/AVX2 implementation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
08357a096c
remove Windows/ICC support
2021-10-12 11:51:34 +03:00
apostolos
b3a20afbbc
limex_shuffle added and it's unit tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
de30471edd
remove duplicate functions from previous merge
2021-10-12 11:51:34 +03:00
George Wort
a879715953
Move SVE functions into their own files.
...
Change-Id: I995ba4b7d2b558ee403693ee45d747d414d3b177
2021-10-12 11:51:34 +03:00
George Wort
6c6aee9682
Implement new DoubleVermicelli16 acceleration functions using SVE2
...
Change-Id: Id4a8ffca840caab930a6e78cc0dfd0fe7d320b4e
2021-10-12 11:51:34 +03:00
George Wort
00fff3f53c
Use SVE for double shufti.
...
Change-Id: I09e0d57bb8a2f05b613f6225dea79ae823136268
2021-10-12 11:51:34 +03:00
George Wort
c95a4c3dd1
Use SVE for single shufti.
...
Change-Id: Ic76940c5bb9b81a1c45d39e9ca396a158c50a7dc
2021-10-12 11:51:34 +03:00
George Wort
df926ef62f
Implement new Vermicelli16 acceleration functions using SVE2.
...
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.
Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
c7086cb7f1
Add SVE2 support for dvermicelli
...
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
a38324a5a3
add arm rshift128/rshift128
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
603bc14cdd
fix failing corner case, add pshufb_maskz()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e35b88f2c8
use STL make_unique, remove wrapper header, breaks C++17 compilation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6f44a1aa26
remove low4bits from the arguments, fix cases that mostly affect loading large (64) vectors and falling out of bounds
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
b67cd7dfd0
use rshift128() instead of vector-wide right shift
2021-10-12 11:51:34 +03:00
George Wort
4bc28272da
Fix CROSS_COMPILE_AARCH64 for SVE issues.
...
Change-Id: I7b9ba3ccb754d96eee22ca01714c783dae1e4956
2021-10-12 11:51:34 +03:00
George Wort
9fb79ac3ec
Add SVE2 support for vermicelli
...
Change-Id: Ia025de53521fbaefe5fb1e4425aaf75c7d80a14e
2021-10-12 11:51:34 +03:00
apostolos
6f88ecac44
Supervector test fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
d04b899c29
fix truffle SIMD for S>16 as well
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
d453a612dc
fix last failing Shufti/Truffle tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0ed10082b1
fix rtruffle, was failing Lbr and a few ReverseTruffle tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
845e533b66
move firstMatch, lastMatch to own header in util
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
41ff0962c4
minor fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2753dbb3b0
rename supervector class header, use dup_*() functions names instead of set1_*(), minor fixes
2021-10-12 11:51:34 +03:00
apostolos
1ce5e17ce9
Truffle simd vectorized
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
23b075cbd4
refactor shufti algorithm to use SuperVector class, WIP
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
556206f138
replace push_back by emplace_back where possible
2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
d3ff893871
prefetch works best when addresses are 64-byte aligned
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
27bd09454f
use correct function names for AVX512, fix build failure
2021-02-15 13:54:19 +02:00
Wang Xiang W
5f930b267c
Limex: exception handling with AVX512
2021-01-25 14:13:13 +02:00
Wang Xiang W
9ea1e4be3d
limex: add fast NFA check
2021-01-25 14:13:13 +02:00
Chang, Harry
5ad3d64b4b
Discard HAVE_AVX512VBMI checks at Sheng/McSheng compile time.
2021-01-25 14:13:13 +02:00
Zhu,Wenjun
d96f1ab505
MCSHENG64: extend to 64-state based on mcsheng
2021-01-25 14:13:13 +02:00
Chang, Harry
a388a0f193
Fix sheng64 dump compile issue in clang.
2021-01-25 14:13:13 +02:00
Chang, Harry
c41d33c53f
Fix sheng64 compile issue in clang and in DEBUG_OUTPUT mode on SKX.
2021-01-25 14:13:13 +02:00
Chang, Harry
ed4b0f713a
SHENG64: 64-state 1-byte shuffle based DFA.
2021-01-25 14:13:13 +02:00
Chang, Harry
6a42b37fca
SHENG32: Compile priority sheng > mcsheng > sheng32.
2021-01-25 14:13:13 +02:00
Chang, Harry
cc747013c4
SHENG32: 32-state 1-byte shuffle based DFA.
2021-01-25 14:13:13 +02:00
Hong, Yang A
d71515be04
DFA: use sherman economically
2021-01-25 14:13:13 +02:00
Konstantinos Margaritis
b62247a36e
borrow cache prefetching tricks from the Marvell port, seem to improve performance by 5-28%
2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
5333467249
fix names, use own intrinsic instead of explicit _mm* ones
2020-09-23 11:51:21 +03:00
Hong, Yang A
88a18dcf98
add AVX512 support for vermicelli model
2020-05-25 13:47:53 +00:00
Pavel Shlyak
3ca3602755
A tiny cleanup
2019-12-02 16:40:38 +00:00
Hong, Yang A
b5a8644b1f
mcclellan: fix dump issue in wide-state case.
2019-01-21 09:59:29 +08:00
Hong, Yang A
805a550a0a
mcclellan: wide state fixes for sanitisers and accept state construction
2019-01-21 09:58:18 +08:00
Hong, Yang A
c06d5e1c14
DFA state compression: 16-bit wide and sherman co-exist
2019-01-21 09:56:37 +08:00
Wang, Xiang W
8a0e4f8249
Use std::distance explicitly to avoid ambiguity with boost
2019-01-11 16:05:55 +08:00