Konstantinos Margaritis
98a950f405
add missing header
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e2fc2c3dfe
remove confusing OPTIMISE flag
2021-10-12 11:51:34 +03:00
apostolos
ee2ed6a8c8
nits
2021-10-12 11:51:34 +03:00
apostolos
e0fefb3489
code size reduction by using function arrays and add bandwidth to output
2021-10-12 11:51:34 +03:00
apostolos
bb9bcb3760
micro-benchmarks for shufti, trufle and noodle added
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
cf4b95fff2
remove Windows/ICC support
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
752d6cf997
fix lshift128 test
2021-10-12 11:51:34 +03:00
apostolos
b26a88efe5
alignr methods for avx2 and avx512 added
2021-10-12 11:51:34 +03:00
apostolos
150ae10ea4
limex_shuffle added and it's unit tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
b9fbfb1204
remove duplicate functions from previous merge
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
acacafe1af
add missing compile flags
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
44496d7508
add accidentally removed lines
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
cd5c251f67
* add -fno-new-ttp-matching to fix build-failures on newer gcc compilers with C++17
...
* add explicit -mssse3, -mavx2 in compiler flags in respective build profiles
2021-10-12 11:51:34 +03:00
George Wort
3bdd48fd61
Move SVE functions into their own files.
...
Change-Id: I995ba4b7d2b558ee403693ee45d747d414d3b177
2021-10-12 11:51:34 +03:00
George Wort
e1f0f6baf7
Implement new DoubleVermicelli16 acceleration functions using SVE2
...
Change-Id: Id4a8ffca840caab930a6e78cc0dfd0fe7d320b4e
2021-10-12 11:51:34 +03:00
George Wort
91f5f10831
Use SVE shufti for counting miracles.
...
Change-Id: Idd4aaf5bbc05fc90e9138c6fed385bc6ffa7b0b8
2021-10-12 11:51:34 +03:00
George Wort
60b2112505
Use SVE for double shufti.
...
Change-Id: I09e0d57bb8a2f05b613f6225dea79ae823136268
2021-10-12 11:51:34 +03:00
George Wort
87ee8d4d7f
Use SVE for single shufti.
...
Change-Id: Ic76940c5bb9b81a1c45d39e9ca396a158c50a7dc
2021-10-12 11:51:34 +03:00
George Wort
d1e763c13b
Use SVE2 for counting miracles.
...
Change-Id: I048dc182e5f4e726b847b3285ffafef4f538e550
2021-10-12 11:51:34 +03:00
George Wort
ceb230c7db
Replace USE_ARM_SVE with HAVE_SVE.
...
Change-Id: I469efaac197cba93201f2ca6eca78ca61be3054d
2021-10-12 11:51:34 +03:00
George Wort
7ba060bbf8
Add Licence to state_compress and bitutils.
...
Change-Id: I958daf82e5aef5bd306424dcfa7812382b266d65
2021-10-12 11:51:34 +03:00
George Wort
b54710d208
Implement new Vermicelli16 acceleration functions using SVE2.
...
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.
Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
b6a7ee7e84
Add SVE2 support for dvermicelli
...
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3296d538ea
add arm rshift128/rshift128
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0033cec725
fix failing corner case, add pshufb_maskz()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5adbfc94b8
use STL make_unique, remove wrapper header, breaks C++17 compilation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
a9413d1397
change C/C++ standard used to C17/C++17
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0ec5dc37ca
remove low4bits from the arguments, fix cases that mostly affect loading large (64) vectors and falling out of bounds
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
dca605d187
fix loadu_maskz, add {l,r}shift128_var(), tab fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
d6fd17ec82
convert to for loops
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2012c503b6
minor fixes, add 2 constructors from half size vectors
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
1fe06faffe
fix lastMatch<64>
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3b8f70af70
provide an {l,r}shift128_var() to fix immediate value build failure in loadu_maskz
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
744125bd53
fix arm loadu_maskz()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
8b612c3923
add arm rshift128/rshift128
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
03e7d788b6
use rshift128() instead of vector-wide right shift
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5fd1ed58e6
add {l,r}shift128()+tests, rename printv_u64() to print64()
2021-10-12 11:51:34 +03:00
George Wort
ace6cd15f2
Use SVE2 Bitperm's bdep instruction in bitutils and state_compress
...
Specifically for pdep64, expand32, and expand64 in bitutils,
as well as all of the loadcompressed functions used in
state_compress.
Change-Id: I92851bd12481dbee6a7e344df0890c4901b56d01
2021-10-12 11:51:34 +03:00
George Wort
7e5138b78f
Fix CROSS_COMPILE_AARCH64 for SVE issues.
...
Change-Id: I7b9ba3ccb754d96eee22ca01714c783dae1e4956
2021-10-12 11:51:34 +03:00
George Wort
acfa11a34f
Add SVE2 support for vermicelli
...
Change-Id: Ia025de53521fbaefe5fb1e4425aaf75c7d80a14e
2021-10-12 11:51:34 +03:00
George Wort
b2332218a4
Remove possibly undefined behaviour from Noodle.
...
Change-Id: I9a7997cea6a48927cb02b00c5dba5009bbf83850
2021-10-12 11:51:34 +03:00
George Wort
ddffd031ed
Remove first check from scanDouble Noodle.
...
Change-Id: I00eabb3cb06ef6a2060df52c26fa8591907a2711
2021-10-12 11:51:34 +03:00
apostolos
ce9ffe9bce
Equal mask test fixed with random numbers
2021-10-12 11:51:34 +03:00
apostolos
b1dfc6abc4
Supervector test fixes
2021-10-12 11:51:34 +03:00
apostolos
a369e3aa53
SuperVector AVX512 implementations
2021-10-12 11:51:34 +03:00
apostolos
3f72b681cc
SuperVector unit tests for AVX2 and AVX512 added
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5c601e2505
really fix lshift for avx2
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2ed6ca72b5
disable OPTIMISE by default
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f16abb1789
fix truffle SIMD for S>16 as well
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f50ba1096b
add AVX2 specializations
2021-10-12 11:51:34 +03:00