gtsoul-tech
|
3ced2f7ebf
|
shiftTooManyBitsSigned
|
2024-04-24 11:13:28 +03:00 |
|
Danila Kutenin
|
1e09891b2b
|
Fix avx512 movemask call
|
2022-07-20 09:03:50 +01:00 |
|
Danila Kutenin
|
eb7b0bb50c
|
Optimize vectorscan for aarch64 by using shrn instruction
This optimization is based on the thread
https://twitter.com/Danlark1/status/1539344279268691970 and uses
shift right and narrow by 4 instruction https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/SHRN--SHRN2--Shift-Right-Narrow--immediate--
To achieve that, I needed to redesign a little movemask into comparemask
and have an additional step towards mask iteration. Our benchmarks
showed 10-15% improvement on average for long matches.
|
2022-06-26 22:55:45 +00:00 |
|
apostolos
|
6440d18b48
|
SuperVector opandnot test enriched
|
2021-11-10 15:12:25 +02:00 |
|
Apostolos Tapsas
|
4f53ec6b08
|
Shuffle simd and SuperVector implementetions as well as their test realy fixed
|
2021-10-25 09:19:30 +03:00 |
|
Apostolos Tapsas
|
789f723814
|
SuperVector shuffle implementation and test function optimized
|
2021-10-22 11:55:39 +00:00 |
|
Konstantinos Margaritis
|
2f55e5b54f
|
add x86 vsh* implementations
|
2021-10-12 11:51:35 +03:00 |
|
Konstantinos Margaritis
|
1af82e395f
|
Changes/Additions to SuperVector class * added ==,!=,>=,>,<=,< operators * reworked shift operators to be more uniform and orthogonal, like Arm ISA * Added Unroller class to allow handling of multiple cases but avoid code duplication * pshufb method can now emulate Intel or not (avoids one instruction).
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
a3f083a9ff
|
initial SSE/AVX2 implementation
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
bb9bcb3760
|
micro-benchmarks for shufti, trufle and noodle added
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
752d6cf997
|
fix lshift128 test
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
b26a88efe5
|
alignr methods for avx2 and avx512 added
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
150ae10ea4
|
limex_shuffle added and it's unit tests
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
d6fd17ec82
|
convert to for loops
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
5fd1ed58e6
|
add {l,r}shift128()+tests, rename printv_u64() to print64()
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
ce9ffe9bce
|
Equal mask test fixed with random numbers
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
b1dfc6abc4
|
Supervector test fixes
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
a369e3aa53
|
SuperVector AVX512 implementations
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
3f72b681cc
|
SuperVector unit tests for AVX2 and AVX512 added
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
1f496a1411
|
tiny change in vector initialization
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
f59be47288
|
harmonise syntax of x86 SuperVector impl.cpp like arm, fix alignr, define printv_* functions when on debug mode only
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
c2a5de03e0
|
rename supervector class header, use dup_*() functions names instead of set1_*(), minor fixes
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
bab390d442
|
Truffle simd vectorized
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
736286c2f3
|
syntax fixes
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
24b984483b
|
fix unit tests, and resp. ARM SuperVector methods based on those unit tests, add print functions for SuperVector
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
0adc21bee6
|
Supervector Unit Tests
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
1e7765c485
|
SuperVector unit tests
|
2021-10-12 11:51:34 +03:00 |
|
apostolos
|
8bbcfe698a
|
unit tests for supervector
|
2021-10-12 11:51:34 +03:00 |
|