Konstantinos Margaritis
b9fbfb1204
remove duplicate functions from previous merge
2021-10-12 11:51:34 +03:00
George Wort
3bdd48fd61
Move SVE functions into their own files.
...
Change-Id: I995ba4b7d2b558ee403693ee45d747d414d3b177
2021-10-12 11:51:34 +03:00
George Wort
e1f0f6baf7
Implement new DoubleVermicelli16 acceleration functions using SVE2
...
Change-Id: Id4a8ffca840caab930a6e78cc0dfd0fe7d320b4e
2021-10-12 11:51:34 +03:00
George Wort
60b2112505
Use SVE for double shufti.
...
Change-Id: I09e0d57bb8a2f05b613f6225dea79ae823136268
2021-10-12 11:51:34 +03:00
George Wort
87ee8d4d7f
Use SVE for single shufti.
...
Change-Id: Ic76940c5bb9b81a1c45d39e9ca396a158c50a7dc
2021-10-12 11:51:34 +03:00
George Wort
b54710d208
Implement new Vermicelli16 acceleration functions using SVE2.
...
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.
Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
b6a7ee7e84
Add SVE2 support for dvermicelli
...
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3296d538ea
add arm rshift128/rshift128
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0033cec725
fix failing corner case, add pshufb_maskz()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5adbfc94b8
use STL make_unique, remove wrapper header, breaks C++17 compilation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0ec5dc37ca
remove low4bits from the arguments, fix cases that mostly affect loading large (64) vectors and falling out of bounds
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
03e7d788b6
use rshift128() instead of vector-wide right shift
2021-10-12 11:51:34 +03:00
George Wort
7e5138b78f
Fix CROSS_COMPILE_AARCH64 for SVE issues.
...
Change-Id: I7b9ba3ccb754d96eee22ca01714c783dae1e4956
2021-10-12 11:51:34 +03:00
George Wort
acfa11a34f
Add SVE2 support for vermicelli
...
Change-Id: Ia025de53521fbaefe5fb1e4425aaf75c7d80a14e
2021-10-12 11:51:34 +03:00
apostolos
b1dfc6abc4
Supervector test fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f16abb1789
fix truffle SIMD for S>16 as well
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2a7e6b71bc
fix last failing Shufti/Truffle tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
db72de41ba
fix rtruffle, was failing Lbr and a few ReverseTruffle tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
ff02749a73
move firstMatch, lastMatch to own header in util
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
31aca74801
minor fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
c2a5de03e0
rename supervector class header, use dup_*() functions names instead of set1_*(), minor fixes
2021-10-12 11:51:34 +03:00
apostolos
bab390d442
Truffle simd vectorized
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f689179a82
refactor shufti algorithm to use SuperVector class, WIP
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3f35a2be37
replace push_back by emplace_back where possible
2021-10-12 11:51:33 +03:00
Konstantinos Margaritis
acb542a5be
prefetch works best when addresses are 64-byte aligned
2021-10-12 11:50:32 +03:00
Konstantinos Margaritis
1c1bca4f98
use correct function names for AVX512, fix build failure
2021-02-15 13:54:19 +02:00
Wang Xiang W
723b469cf7
Limex: exception handling with AVX512
2021-01-25 14:13:13 +02:00
Wang Xiang W
a307e11283
limex: add fast NFA check
2021-01-25 14:13:13 +02:00
Chang, Harry
191cfef6cd
Discard HAVE_AVX512VBMI checks at Sheng/McSheng compile time.
2021-01-25 14:13:13 +02:00
Zhu,Wenjun
1c8c7ea806
MCSHENG64: extend to 64-state based on mcsheng
2021-01-25 14:13:13 +02:00
Chang, Harry
63c7345ab2
Fix sheng64 dump compile issue in clang.
2021-01-25 14:13:13 +02:00
Chang, Harry
6c56aaf7a9
Fix sheng64 compile issue in clang and in DEBUG_OUTPUT mode on SKX.
2021-01-25 14:13:13 +02:00
Chang, Harry
cfd3b0bf4e
SHENG64: 64-state 1-byte shuffle based DFA.
2021-01-25 14:13:13 +02:00
Chang, Harry
7bd488922a
SHENG32: Compile priority sheng > mcsheng > sheng32.
2021-01-25 14:13:13 +02:00
Chang, Harry
2cde84c96d
SHENG32: 32-state 1-byte shuffle based DFA.
2021-01-25 14:13:13 +02:00
Hong, Yang A
6f8bfa1854
DFA: use sherman economically
2021-01-25 14:13:13 +02:00
Konstantinos Margaritis
e830470028
borrow cache prefetching tricks from the Marvell port, seem to improve performance by 5-28%
2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
8ef26f19fc
fix names, use own intrinsic instead of explicit _mm* ones
2020-09-23 11:51:21 +03:00
Hong, Yang A
88a18dcf98
add AVX512 support for vermicelli model
2020-05-25 13:47:53 +00:00
Pavel Shlyak
3ca3602755
A tiny cleanup
2019-12-02 16:40:38 +00:00
Hong, Yang A
b5a8644b1f
mcclellan: fix dump issue in wide-state case.
2019-01-21 09:59:29 +08:00
Hong, Yang A
805a550a0a
mcclellan: wide state fixes for sanitisers and accept state construction
2019-01-21 09:58:18 +08:00
Hong, Yang A
c06d5e1c14
DFA state compression: 16-bit wide and sherman co-exist
2019-01-21 09:56:37 +08:00
Wang, Xiang W
8a0e4f8249
Use std::distance explicitly to avoid ambiguity with boost
2019-01-11 16:05:55 +08:00
Justin Viiret
16076ed4a3
mcsheng: debug format string fixes
2018-06-27 13:39:30 +08:00
Justin Viiret
25adf3f512
sheng: fix reportCurrent eod flag
...
eod here should be 0, not 1. The reportCurrent NFA API function for
Sheng is unused at the moment, so this wasn't causing any problems
earlier.
2018-06-27 13:39:24 +08:00
Justin Viiret
e65479dae5
mcclellancompile: MAX_SHERMAN_LIST_LEN can be 9
2018-06-27 13:39:10 +08:00
Justin Viiret
ce7cfbde82
misc: docs, typo fixes, small cleanups
2018-06-27 13:39:05 +08:00
Hong, Yang A
ae918116ab
find_better_daddy: position change
2017-09-18 13:31:09 +10:00
Justin Viiret
ea2e85ac87
ng_squash: switch to using unordered_map
...
Also some cleaning up, small performance improvements.
2017-09-18 13:29:34 +10:00