1471 Commits

Author SHA1 Message Date
Konstantinos Margaritis
9fd94e0062 use unaligned loads for short scans 2021-02-11 14:21:57 +02:00
Konstantinos Margaritis
d3e03ed88a optimize case mask AND out of the loop 2021-02-10 13:29:45 +02:00
Konstantinos Margaritis
be66cdb51d fixes in shifting primitives 2021-02-08 19:38:20 +02:00
Konstantinos Margaritis
f541f75400 bugfix compress128/expand128, add unit tests 2021-02-08 19:20:37 +02:00
Konstantinos Margaritis
d9874898c7 make const 2021-02-08 19:19:52 +02:00
Wang Xiang W
6a8a7a6c01 Bump version number for release 2021-01-25 14:13:13 +02:00
Chang, Harry
52f658ac55 Fix Klocwork scan issues. 2021-01-25 14:13:13 +02:00
Wang Xiang W
5f930b267c Limex: exception handling with AVX512 2021-01-25 14:13:13 +02:00
Chang, Harry
001b7824d2 Logical Combination: use hs_misc_free instead of free.
fixes github issue #284
2021-01-25 14:13:13 +02:00
Wang Xiang W
beaca7c7db Adjust sensitive terms 2021-01-25 14:13:13 +02:00
Wang Xiang W
9ea1e4be3d limex: add fast NFA check 2021-01-25 14:13:13 +02:00
Chang, Harry
5ad3d64b4b Discard HAVE_AVX512VBMI checks at Sheng/McSheng compile time. 2021-01-25 14:13:13 +02:00
Chang, Harry
b19a41528a Add cpu feature / target info "AVX512VBMI". 2021-01-25 14:13:13 +02:00
Zhu,Wenjun
d96f1ab505 MCSHENG64: extend to 64-state based on mcsheng 2021-01-25 14:13:13 +02:00
Hong, Yang A
dea7c4dc2e lookaround:
add 64x8 and 64x16 shufti models
add mask64 model
expand entry quantity
2021-01-25 14:13:13 +02:00
Chang, Harry
56cb107005 AVX512VBMI Fat Teddy. 2021-01-25 14:13:13 +02:00
Chang, Harry
f5657ef7b7 Fix find_vertices_in_cycles(): don't check self-loop in SCC. 2021-01-25 14:13:13 +02:00
Chang, Harry
a388a0f193 Fix sheng64 dump compile issue in clang. 2021-01-25 14:13:13 +02:00
Chang, Harry
c41d33c53f Fix sheng64 compile issue in clang and in DEBUG_OUTPUT mode on SKX. 2021-01-25 14:13:13 +02:00
Chang, Harry
ed4b0f713a SHENG64: 64-state 1-byte shuffle based DFA. 2021-01-25 14:13:13 +02:00
Chang, Harry
6a42b37fca SHENG32: Compile priority sheng > mcsheng > sheng32. 2021-01-25 14:13:13 +02:00
Chang, Harry
cc747013c4 SHENG32: 32-state 1-byte shuffle based DFA. 2021-01-25 14:13:13 +02:00
Hong, Yang A
d71515be04 DFA: use sherman economically 2021-01-25 14:13:13 +02:00
Konstantinos Margaritis
87413fbff0 optimize get_conf_stride_1() 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
e2f253d8ab remove loads from movemask128, variable_byte_shift, add palignr_imm(), minor fixes 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
a039089888 fix non-const char * write-strings compile error 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
4686ac47b6 replace andn() by explicit bitops and group loads/stores, gives ~1% gain 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
b62247a36e borrow cache prefetching tricks from the Marvell port, seem to improve performance by 5-28% 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
5b85589274 add some useful intrinsics 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
1c581e45e9 add expand128() implementation for NEON 2021-01-25 12:13:35 +02:00
Konstantinos Margaritis
752a42419b fix IA32 build, as we need minimum SSSE3 support for compilation to succeed 2020-12-30 19:57:44 +02:00
Konstantinos Margaritis
61b963a717 fix x86 compilation 2020-12-08 11:42:30 +02:00
Konstantinos Margaritis
e088c6ae2b remove forgotten printf 2020-12-07 23:12:41 +02:00
Konstantinos Margaritis
773dc6fa69 optimize *shiftbyte_m128() functions to use palign instead of variable_byte_shift_m128() 2020-12-07 23:12:26 +02:00
Konstantinos Margaritis
39945b7775 clear zones array 2020-12-03 19:30:50 +02:00
Konstantinos Margaritis
c38722a68b add ARM platform 2020-12-03 19:27:58 +02:00
Konstantinos Margaritis
38477b08bc fix movq and load_m128_from_u64a and resp. test for NEON 2020-12-03 19:27:38 +02:00
Konstantinos Margaritis
259c2572c1 define debug vector print functions to NULL in non-debug mode 2020-12-03 19:27:05 +02:00
Konstantinos Margaritis
17ab42d891 small optimization that was for some reason failing in ARM, should be faster anyway 2020-11-24 17:59:42 +02:00
Konstantinos Margaritis
d76365240b helper functions to print a m128 vector in debug mode 2020-11-24 17:57:16 +02:00
Konstantinos Margaritis
1c26f044a7 when building in debug mode, vgetq_lane_*() and vextq_*() need immediate operands, and we have to use switch()'ed versions 2020-11-24 17:56:40 +02:00
Konstantinos Margaritis
c4f1372814 remove debug from functions 2020-11-05 20:33:17 +02:00
Konstantinos Margaritis
501f60e930 add some debug info 2020-11-05 19:20:37 +02:00
Konstantinos Margaritis
33904180d8 add compress128 function and implementation 2020-11-05 19:20:06 +02:00
Konstantinos Margaritis
7b8cf97546 add extra instructions (currently arm-only), fix order of elements in set4x32/set2x64 2020-11-05 19:18:53 +02:00
Konstantinos Margaritis
547f79b920 small optimization in storecompress*() 2020-10-30 10:49:50 +02:00
Konstantinos Margaritis
548242981d fix ARM implementations 2020-10-30 10:38:41 +02:00
Konstantinos Margaritis
149ea938c4 don't redefine function on x86 2020-10-16 13:09:08 +03:00
Konstantinos Margaritis
c4db63665a scalar implementations of diffrich256 and diffrich384 2020-10-16 13:02:40 +03:00
Konstantinos Margaritis
4bce012570 Revert "move x86 popcount.h implementations to util/arch/x86/popcount.h"
This reverts commit 6581aae90e55520353c03edb716de80ecc03521a.
2020-10-16 12:32:44 +03:00