Konstantinos Margaritis
|
e6cfd11948
|
prefix assume_aligned to avoid clash with std::assume_aligned in c++20
|
2022-11-01 10:29:22 +00:00 |
|
Konstantinos Margaritis
|
dc6b8ae92d
|
optimize comparemask implementation, clean up code, use union types instead of casts
|
2022-09-07 02:02:11 +03:00 |
|
Danila Kutenin
|
8a49e20bcd
|
Fix formatting of a couple files
|
2022-06-26 22:59:58 +00:00 |
|
Danila Kutenin
|
49eb18ee4f
|
Optimize vectorscan for aarch64 by using shrn instruction
This optimization is based on the thread
https://twitter.com/Danlark1/status/1539344279268691970 and uses
shift right and narrow by 4 instruction https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/SHRN--SHRN2--Shift-Right-Narrow--immediate--
To achieve that, I needed to redesign a little movemask into comparemask
and have an additional step towards mask iteration. Our benchmarks
showed 10-15% improvement on average for long matches.
|
2022-06-26 22:55:45 +00:00 |
|
Konstantinos Margaritis
|
4aa32275f1
|
use same definition of the union for all types
|
2021-12-02 18:00:02 +02:00 |
|
Konstantinos Margaritis
|
0221dc1771
|
fix misompilations with clang++, as it is more strict
|
2021-12-01 23:22:15 +02:00 |
|
Apostolos Tapsas
|
0287724413
|
WIP:tracking last bugs in failing tests for release build
|
2021-11-16 15:24:22 +00:00 |
|
apostolos
|
e09d8674b4
|
resolving conficts after merging
|
2021-11-13 18:58:22 +02:00 |
|
Konstantinos Margaritis
|
7b65b298c1
|
add arm vector types in union, avoid -flax-conversions, fix castings
|
2021-11-01 16:52:17 +02:00 |
|
Vectorcamp
|
2231f7c024
|
compile fixes for vsc port
|
2021-10-14 13:53:55 +03:00 |
|
Konstantinos Margaritis
|
577e03e0c7
|
rearrange method declarations
|
2021-10-12 11:51:35 +03:00 |
|
Konstantinos Margaritis
|
67e0674df8
|
Changes/Additions to SuperVector class * added ==,!=,>=,>,<=,< operators * reworked shift operators to be more uniform and orthogonal, like Arm ISA * Added Unroller class to allow handling of multiple cases but avoid code duplication * pshufb method can now emulate Intel or not (avoids one instruction).
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
e7161fdfec
|
initial SSE/AVX2 implementation
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
603bc14cdd
|
fix failing corner case, add pshufb_maskz()
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
f8ce0bb922
|
minor fixes, add 2 constructors from half size vectors
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
ebb1b84ae3
|
provide an {l,r}shift128_var() to fix immediate value build failure in loadu_maskz
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
6c51f7f591
|
add {l,r}shift128()+tests, rename printv_u64() to print64()
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
c45e72775f
|
convert print helper functions to class methods
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
28b2949396
|
harmonise syntax of x86 SuperVector impl.cpp like arm, fix alignr, define printv_* functions when on debug mode only
|
2021-10-12 11:51:34 +03:00 |
|
Konstantinos Margaritis
|
2753dbb3b0
|
rename supervector class header, use dup_*() functions names instead of set1_*(), minor fixes
|
2021-10-12 11:51:34 +03:00 |
|