Commit Graph

300 Commits

Author SHA1 Message Date
Konstantinos Margaritis
b9fbfb1204 remove duplicate functions from previous merge 2021-10-12 11:51:34 +03:00
George Wort
3bdd48fd61 Move SVE functions into their own files.
Change-Id: I995ba4b7d2b558ee403693ee45d747d414d3b177
2021-10-12 11:51:34 +03:00
George Wort
e1f0f6baf7 Implement new DoubleVermicelli16 acceleration functions using SVE2
Change-Id: Id4a8ffca840caab930a6e78cc0dfd0fe7d320b4e
2021-10-12 11:51:34 +03:00
George Wort
60b2112505 Use SVE for double shufti.
Change-Id: I09e0d57bb8a2f05b613f6225dea79ae823136268
2021-10-12 11:51:34 +03:00
George Wort
87ee8d4d7f Use SVE for single shufti.
Change-Id: Ic76940c5bb9b81a1c45d39e9ca396a158c50a7dc
2021-10-12 11:51:34 +03:00
George Wort
b54710d208 Implement new Vermicelli16 acceleration functions using SVE2.
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.

Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
b6a7ee7e84 Add SVE2 support for dvermicelli
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3296d538ea add arm rshift128/rshift128 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0033cec725 fix failing corner case, add pshufb_maskz() 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5adbfc94b8 use STL make_unique, remove wrapper header, breaks C++17 compilation 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0ec5dc37ca remove low4bits from the arguments, fix cases that mostly affect loading large (64) vectors and falling out of bounds 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
03e7d788b6 use rshift128() instead of vector-wide right shift 2021-10-12 11:51:34 +03:00
George Wort
7e5138b78f Fix CROSS_COMPILE_AARCH64 for SVE issues.
Change-Id: I7b9ba3ccb754d96eee22ca01714c783dae1e4956
2021-10-12 11:51:34 +03:00
George Wort
acfa11a34f Add SVE2 support for vermicelli
Change-Id: Ia025de53521fbaefe5fb1e4425aaf75c7d80a14e
2021-10-12 11:51:34 +03:00
apostolos
b1dfc6abc4 Supervector test fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f16abb1789 fix truffle SIMD for S>16 as well 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2a7e6b71bc fix last failing Shufti/Truffle tests 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
db72de41ba fix rtruffle, was failing Lbr and a few ReverseTruffle tests 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
ff02749a73 move firstMatch, lastMatch to own header in util 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
31aca74801 minor fixes 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
c2a5de03e0 rename supervector class header, use dup_*() functions names instead of set1_*(), minor fixes 2021-10-12 11:51:34 +03:00
apostolos
bab390d442 Truffle simd vectorized 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f689179a82 refactor shufti algorithm to use SuperVector class, WIP 2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3f35a2be37 replace push_back by emplace_back where possible 2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
acb542a5be prefetch works best when addresses are 64-byte aligned 2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
1c1bca4f98 use correct function names for AVX512, fix build failure 2021-02-15 13:54:19 +02:00
Wang Xiang W
723b469cf7 Limex: exception handling with AVX512 2021-01-25 14:13:13 +02:00
Wang Xiang W
a307e11283 limex: add fast NFA check 2021-01-25 14:13:13 +02:00
Chang, Harry
191cfef6cd Discard HAVE_AVX512VBMI checks at Sheng/McSheng compile time. 2021-01-25 14:13:13 +02:00
Zhu,Wenjun
1c8c7ea806 MCSHENG64: extend to 64-state based on mcsheng 2021-01-25 14:13:13 +02:00
Chang, Harry
63c7345ab2 Fix sheng64 dump compile issue in clang. 2021-01-25 14:13:13 +02:00
Chang, Harry
6c56aaf7a9 Fix sheng64 compile issue in clang and in DEBUG_OUTPUT mode on SKX. 2021-01-25 14:13:13 +02:00
Chang, Harry
cfd3b0bf4e SHENG64: 64-state 1-byte shuffle based DFA. 2021-01-25 14:13:13 +02:00
Chang, Harry
7bd488922a SHENG32: Compile priority sheng > mcsheng > sheng32. 2021-01-25 14:13:13 +02:00
Chang, Harry
2cde84c96d SHENG32: 32-state 1-byte shuffle based DFA. 2021-01-25 14:13:13 +02:00
Hong, Yang A
6f8bfa1854 DFA: use sherman economically 2021-01-25 14:13:13 +02:00
Konstantinos Margaritis
e830470028 borrow cache prefetching tricks from the Marvell port, seem to improve performance by 5-28% 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
8ef26f19fc fix names, use own intrinsic instead of explicit _mm* ones 2020-09-23 11:51:21 +03:00
Hong, Yang A
88a18dcf98 add AVX512 support for vermicelli model 2020-05-25 13:47:53 +00:00
Pavel Shlyak
3ca3602755 A tiny cleanup 2019-12-02 16:40:38 +00:00
Hong, Yang A
b5a8644b1f mcclellan: fix dump issue in wide-state case. 2019-01-21 09:59:29 +08:00
Hong, Yang A
805a550a0a mcclellan: wide state fixes for sanitisers and accept state construction 2019-01-21 09:58:18 +08:00
Hong, Yang A
c06d5e1c14 DFA state compression: 16-bit wide and sherman co-exist 2019-01-21 09:56:37 +08:00
Wang, Xiang W
8a0e4f8249 Use std::distance explicitly to avoid ambiguity with boost 2019-01-11 16:05:55 +08:00
Justin Viiret
16076ed4a3 mcsheng: debug format string fixes 2018-06-27 13:39:30 +08:00
Justin Viiret
25adf3f512 sheng: fix reportCurrent eod flag
eod here should be 0, not 1. The reportCurrent NFA API function for
Sheng is unused at the moment, so this wasn't causing any problems
earlier.
2018-06-27 13:39:24 +08:00
Justin Viiret
e65479dae5 mcclellancompile: MAX_SHERMAN_LIST_LEN can be 9 2018-06-27 13:39:10 +08:00
Justin Viiret
ce7cfbde82 misc: docs, typo fixes, small cleanups 2018-06-27 13:39:05 +08:00
Hong, Yang A
ae918116ab find_better_daddy: position change 2017-09-18 13:31:09 +10:00
Justin Viiret
ea2e85ac87 ng_squash: switch to using unordered_map
Also some cleaning up, small performance improvements.
2017-09-18 13:29:34 +10:00