Apostolos Tapsas
3655175b6d
SuperVector operators fixes and simd_utils low/high64 functions implementations added
2021-10-18 12:26:38 +00:00
Apostolos Tapsas
f0e6b8459c
SuperVector vsh* implementations
2021-10-15 14:07:17 +00:00
apostolos
6308c3b475
match file for ARCH_PPC64EL added
2021-10-14 16:26:59 +03:00
apostolos
fd905a0c9e
trufle and shufle implementations for ARCH_PPC64EL
2021-10-14 16:01:21 +03:00
apostolos
6aac8241b1
blockSigleMask implementations for ARCH_PPC64 added
2021-10-14 15:56:13 +03:00
apostolos
66748881ee
Supervector vsh* added
2021-10-14 15:08:23 +03:00
Apostolos Tapsas
3423ea5b2b
WIP: Power VSX support almost completed
2021-10-14 13:53:55 +03:00
Vectorcamp
28f8f30866
compile fixes for vsc port
2021-10-14 13:53:55 +03:00
apostolos
732fc5e791
update powerpc simd util file functions
2021-10-14 13:53:55 +03:00
apostolos
59a3ab9443
implementations for powerpc64el architecture
2021-10-14 13:53:55 +03:00
Konstantinos Margaritis
14be68587b
add initial ppc64el support
...
(cherry picked from commit 63e26a4b28 )
(cherry picked from commit c214ba253327114c16d0724f75c998ab00d44919)
2021-10-14 13:53:55 +03:00
Konstantinos Margaritis
2b3d0a355b
Add missing copyright info from tampered files
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
4a4a851c6d
fix multiple/undefined symbols when using fat runtimes
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
ae81088193
add arm truffle block function
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
45f395245b
add simd_onebit_masks as static in arm simd_utils.h as well
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
a654204122
simplify truffle and provide arch-specific block functions
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
92e0b9a351
simplify shufti and provide arch-specific block functions
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
a1acc456cc
rearrange method declarations
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
f2e45ccc06
remove simd_utils.c
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
2f55e5b54f
add x86 vsh* implementations
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
3248393d1a
use movemask
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
a85b1c75d1
add header define to avoid double inclusion
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
f6f7d7a039
optimize and simplify Shufti and Truffle to work with a single block method instead
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
1503d9a946
remove asserts, as they are not needed
2021-10-12 11:51:35 +03:00
Konstantinos Margaritis
5563f0c3b6
firstMatch/lastMatch are now arch-dependent, emulating movemask on non-Intel is very costly, the alternative is almost twice as fast on Arm
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
690e3c24e6
fix for new pshufb
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
1af82e395f
Changes/Additions to SuperVector class * added ==,!=,>=,>,<=,< operators * reworked shift operators to be more uniform and orthogonal, like Arm ISA * Added Unroller class to allow handling of multiple cases but avoid code duplication * pshufb method can now emulate Intel or not (avoids one instruction).
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
a3f083a9ff
initial SSE/AVX2 implementation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
cf4b95fff2
remove Windows/ICC support
2021-10-12 11:51:34 +03:00
apostolos
b26a88efe5
alignr methods for avx2 and avx512 added
2021-10-12 11:51:34 +03:00
apostolos
150ae10ea4
limex_shuffle added and it's unit tests
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
b9fbfb1204
remove duplicate functions from previous merge
2021-10-12 11:51:34 +03:00
George Wort
3bdd48fd61
Move SVE functions into their own files.
...
Change-Id: I995ba4b7d2b558ee403693ee45d747d414d3b177
2021-10-12 11:51:34 +03:00
George Wort
e1f0f6baf7
Implement new DoubleVermicelli16 acceleration functions using SVE2
...
Change-Id: Id4a8ffca840caab930a6e78cc0dfd0fe7d320b4e
2021-10-12 11:51:34 +03:00
George Wort
91f5f10831
Use SVE shufti for counting miracles.
...
Change-Id: Idd4aaf5bbc05fc90e9138c6fed385bc6ffa7b0b8
2021-10-12 11:51:34 +03:00
George Wort
60b2112505
Use SVE for double shufti.
...
Change-Id: I09e0d57bb8a2f05b613f6225dea79ae823136268
2021-10-12 11:51:34 +03:00
George Wort
87ee8d4d7f
Use SVE for single shufti.
...
Change-Id: Ic76940c5bb9b81a1c45d39e9ca396a158c50a7dc
2021-10-12 11:51:34 +03:00
George Wort
d1e763c13b
Use SVE2 for counting miracles.
...
Change-Id: I048dc182e5f4e726b847b3285ffafef4f538e550
2021-10-12 11:51:34 +03:00
George Wort
ceb230c7db
Replace USE_ARM_SVE with HAVE_SVE.
...
Change-Id: I469efaac197cba93201f2ca6eca78ca61be3054d
2021-10-12 11:51:34 +03:00
George Wort
7ba060bbf8
Add Licence to state_compress and bitutils.
...
Change-Id: I958daf82e5aef5bd306424dcfa7812382b266d65
2021-10-12 11:51:34 +03:00
George Wort
b54710d208
Implement new Vermicelli16 acceleration functions using SVE2.
...
The scheme utilises the MATCH and NMATCH instructions to
scan for 16 characters at the same rate as vermicelli
scans for one.
Change-Id: Ie2cef904c56651e6108593c668e9b65bc001a886
2021-10-12 11:51:34 +03:00
George Wort
b6a7ee7e84
Add SVE2 support for dvermicelli
...
Change-Id: I056ef15e162ab6fb1f78964321ce893f4096367e
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3296d538ea
add arm rshift128/rshift128
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0033cec725
fix failing corner case, add pshufb_maskz()
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
5adbfc94b8
use STL make_unique, remove wrapper header, breaks C++17 compilation
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
0ec5dc37ca
remove low4bits from the arguments, fix cases that mostly affect loading large (64) vectors and falling out of bounds
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
dca605d187
fix loadu_maskz, add {l,r}shift128_var(), tab fixes
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
2012c503b6
minor fixes, add 2 constructors from half size vectors
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
1fe06faffe
fix lastMatch<64>
2021-10-12 11:51:34 +03:00
Konstantinos Margaritis
3b8f70af70
provide an {l,r}shift128_var() to fix immediate value build failure in loadu_maskz
2021-10-12 11:51:34 +03:00