apostolos
a86d6c290d
nit
2021-10-12 11:51:34 +03:00
apostolos
ee8fa17351
fix benchmarks outputs
2021-10-12 11:51:34 +03:00
apostolos
53b9034546
bandwidth output fixes
2021-10-12 11:51:34 +03:00
apostolos
0e141ce700
size outup for case with match fixed
2021-10-12 11:51:34 +03:00
apostolos
5d4adf267d
nits
2021-10-12 11:51:34 +03:00
apostolos
2e6c75c895
size output fixed
2021-10-12 11:51:34 +03:00
apostolos
9901477bcf
nits
2021-10-12 11:51:34 +03:00
apostolos
2b9636ccc0
benchmarks output fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
91f58fb1ca
add missing header
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
be1551aa94
remove confusing OPTIMISE flag
2021-10-12 11:51:34 +03:00
apostolos
4027319d6c
nits
2021-10-12 11:51:34 +03:00
apostolos
1009391d9f
code size reduction by using function arrays and add bandwidth to output
2021-10-12 11:51:34 +03:00
apostolos
904a94fbe5
micro-benchmarks for shufti, trufle and noodle added
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
08357a096c
remove Windows/ICC support
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
8cff876962
fix lshift128 test
2021-10-12 11:51:34 +03:00
apostolos
67fa6d2738
alignr methods for avx2 and avx512 added
2021-10-12 11:51:34 +03:00
apostolos
b3a20afbbc
limex_shuffle added and it's unit tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
de30471edd
remove duplicate functions from previous merge
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e5050c9373
add missing compile flags
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
7f5e859019
add accidentally removed lines
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
deae90f947
* add -fno-new-ttp-matching to fix build-failures on newer gcc compilers with C++17
...
* add explicit -mssse3, -mavx2 in compiler flags in respective build profiles
2021-10-12 11:51:34 +03:00
George Wort
a879715953
Move SVE functions into their own files.
...
Change-Id: I995ba4b7d2b558ee403693ee45d747d414d3b177
2021-10-12 11:51:34 +03:00
George Wort
6c6aee9682
Implement new DoubleVermicelli16 acceleration functions using SVE2
...
Change-Id: Id4a8ffca840caab930a6e78cc0dfd0fe7d320b4e
2021-10-12 11:51:34 +03:00
George Wort
25183089fd
Use SVE shufti for counting miracles.
...
Change-Id: Idd4aaf5bbc05fc90e9138c6fed385bc6ffa7b0b8
2021-10-12 11:51:34 +03:00
George Wort
00fff3f53c
Use SVE for double shufti.
...
Change-Id: I09e0d57bb8a2f05b613f6225dea79ae823136268
2021-10-12 11:51:34 +03:00
George Wort
c95a4c3dd1
Use SVE for single shufti.
...
Change-Id: Ic76940c5bb9b81a1c45d39e9ca396a158c50a7dc
2021-10-12 11:51:34 +03:00
George Wort
56ef2d5f72
Use SVE2 for counting miracles.
...
Change-Id: I048dc182e5f4e726b847b3285ffafef4f538e550
2021-10-12 11:51:34 +03:00
George Wort
ab5d4d9279
Replace USE_ARM_SVE with HAVE_SVE.
...
Change-Id: I469efaac197cba93201f2ca6eca78ca61be3054d
2021-10-12 11:51:34 +03:00
George Wort
8242f46ed7
Add Licence to state_compress and bitutils.
...
Change-Id: I958daf82e5aef5bd306424dcfa7812382b266d65
2021-10-12 11:51:34 +03:00
George Wort
df926ef62f
Implement new Vermicelli16 acceleration functions using SVE2.
...
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.
Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
c7086cb7f1
Add SVE2 support for dvermicelli
...
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
a38324a5a3
add arm rshift128/rshift128
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
603bc14cdd
fix failing corner case, add pshufb_maskz()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
e35b88f2c8
use STL make_unique, remove wrapper header, breaks C++17 compilation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f5f37f3f40
change C/C++ standard used to C17/C++17
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6f44a1aa26
remove low4bits from the arguments, fix cases that mostly affect loading large (64) vectors and falling out of bounds
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f2d9784979
fix loadu_maskz, add {l,r}shift128_var(), tab fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
a2e6143ea1
convert to for loops
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
f8ce0bb922
minor fixes, add 2 constructors from half size vectors
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
cabd13d18a
fix lastMatch<64>
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
ebb1b84ae3
provide an {l,r}shift128_var() to fix immediate value build failure in loadu_maskz
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
825460856f
fix arm loadu_maskz()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
86accf41a3
add arm rshift128/rshift128
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
b67cd7dfd0
use rshift128() instead of vector-wide right shift
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
6c51f7f591
add {l,r}shift128()+tests, rename printv_u64() to print64()
2021-10-12 11:51:34 +03:00
George Wort
051ceed0f9
Use SVE2 Bitperm's bdep instruction in bitutils and state_compress
...
Specifically for pdep64, expand32, and expand64 in bitutils,
as well as all of the loadcompressed functions used in
state_compress.
Change-Id: I92851bd12481dbee6a7e344df0890c4901b56d01
2021-10-12 11:51:34 +03:00
George Wort
4bc28272da
Fix CROSS_COMPILE_AARCH64 for SVE issues.
...
Change-Id: I7b9ba3ccb754d96eee22ca01714c783dae1e4956
2021-10-12 11:51:34 +03:00
George Wort
9fb79ac3ec
Add SVE2 support for vermicelli
...
Change-Id: Ia025de53521fbaefe5fb1e4425aaf75c7d80a14e
2021-10-12 11:51:34 +03:00
George Wort
7162446358
Remove possibly undefined behaviour from Noodle.
...
Change-Id: I9a7997cea6a48927cb02b00c5dba5009bbf83850
2021-10-12 11:51:34 +03:00
George Wort
b48ea2c1a6
Remove first check from scanDouble Noodle.
...
Change-Id: I00eabb3cb06ef6a2060df52c26fa8591907a2711
2021-10-12 11:51:34 +03:00