Commit Graph

144 Commits

Author SHA1 Message Date
gtsoul-tech
9df8527e91 variableScope 2024-04-29 13:13:07 +03:00
Konstantinos Margaritis
a4d1779945 Merge pull request #225 from VectorCamp/feature/cleanup-compiler-warnings
According to https://buildbot-ci.vectorcamp.gr/#/changes/93

most builds succceded and with no compiler warnings. The build failures were only on x86 and Arm for SIMDe builds: x86 because of a bug in SIMDe emulation of own x86 intrinsics in non-native mode and Arm due to clang, unsure if this is actually a bug in SIMDe or clang itself. All the remaining compiler warnings that were suppressed was because they were not possible to fix for the scope of this project. 

This PR will close #170, code quality improvements however will continue with the integration of #222 or similar static code analyzer to CI and continuous refactoring.
2024-01-20 22:41:00 +02:00
Yoan Picchi
6652d4a837 Make the match component of SVE truffle constant time
There are no significant speed up for 128b vectors but we expect some speedup
for wider vectors compared to the previous linear time implementation of the
match.hpp component

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-01-18 11:53:45 +00:00
Konstantinos Margaritis
fdc067861e check the correct define 2024-01-18 00:41:56 +02:00
Yoan Picchi
c67076ce22 Add truffle SVE implementation
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
2024-01-09 16:50:03 +00:00
Konstantinos Margaritis
50675d0af6 add fallback pdep64 for x86 if no HAVE_BMI2 2023-12-20 08:25:30 +02:00
Konstantinos Margaritis
192bf38d56 add missing pdep64 for x86 bitutils 2023-12-20 00:12:15 +02:00
Konstantinos Margaritis
38231b2a5e add missing pdep64 for arm and ppc64le 2023-12-19 23:15:27 +02:00
Konstantinos Margaritis
5cb3a69edc make diffrich384 available on all arches 2023-11-28 12:06:46 +00:00
Konstantinos Margaritis
64d106e582 fix compilation for SIMDe 2023-11-27 20:52:52 +00:00
Konstantinos Margaritis
1fb601f3a9 fix SIMDe emulation builds on Arm, add native translation from x86 for comparison 2023-11-27 12:21:58 +00:00
Konstantinos Margaritis
b0d9c7f879 existing scalar implementations were incorrect -but never tested, ported from arm/ppc64le 2023-11-23 16:09:10 +00:00
Konstantinos Margaritis
9cf061b89b add missing intrinsics for SIMDe backend 2023-11-23 16:08:26 +00:00
Konstantinos Margaritis
99807c17a6 enable SIMDe backend 2023-11-21 17:13:33 +00:00
Konstantinos Margaritis
50a664b5c3 add SIMDe ports of simd_utils and supervector 2023-11-21 17:12:04 +00:00
Konstantinos Margaritis
1ca4dc8b39 Ubuntu 20.04 gcc does not define HWCAP2_SVE2 #180 2023-10-10 18:30:12 +08:00
Konstantinos Margaritis
3f9c05d57f fix cmake refactor for arm builds 2023-10-09 10:03:53 +00:00
Konstantinos Margaritis
abcc974d1d add missing file 2023-10-07 12:10:42 +03:00
Konstantinos Margaritis
4ae1aebc1b use the conditional in the right way 2023-10-04 20:35:58 +03:00
Konstantinos Margaritis
bfe1aa52f1 add conditional for __clang__ 2023-10-04 20:28:35 +03:00
Konstantinos Margaritis
b5d87d3877 clang 15 (but not 16) fails on ppc64le with -Wdeprecate-lax-vec-conv-all 2023-10-04 20:09:45 +03:00
Konstantinos Margaritis
89a85a8e90 HWCAP is only available on Linux 2023-09-08 10:08:44 +03:00
Konstantinos Margaritis
394d09fe45 initial attempt for fat binary on Aarch64 2023-08-23 09:42:00 +00:00
Konstantinos Margaritis
1e3b031dee prefix assume_aligned to avoid clash with std::assume_aligned in c++20 2022-11-01 10:29:22 +00:00
Konstantinos Margaritis
8a6add2fb6 [VSX] movemask needs to be explicitly aligned on clang for vec_ste 2022-09-16 12:50:33 +03:00
Konstantinos Margaritis
4b41c5fe25 [NEON] simplify/optimize shift/align primitives 2022-09-12 13:09:51 +00:00
Konstantinos Margaritis
a0e53c7d85 use correct intrinsic for lshiftbyte_m128 2022-09-07 16:00:10 +03:00
Konstantinos Margaritis
37b2cae189 provide non-immediate versions of lshiftbyte/rshiftbyte on x86 2022-09-07 15:07:20 +03:00
Konstantinos Margaritis
ce90e58af1 readd simd_onebit_masks for x86, needs more work 2022-09-07 13:42:25 +03:00
Konstantinos Margaritis
0052df5f5b [NEON] optimize mask1bit128, get rid of simd_onebit_masks 2022-09-07 10:20:01 +00:00
Konstantinos Margaritis
76a31d1bc0 remove simd_onebit_masks from arm/x86 headers, as they moved to common 2022-09-07 12:41:32 +03:00
Konstantinos Margaritis
c097f169ad [VSX] add algorithm for alignr w/o use of immediates 2022-09-07 00:01:54 +03:00
Konstantinos Margaritis
bdc3947746 [VSX] correct lshiftbyte_m128/rshiftbyte_m128, variable_byte_shift 2022-09-06 23:59:51 +03:00
Konstantinos Margaritis
59ace0ebf8 [VSX] huge optimization of movemask128 2022-09-06 20:08:44 +03:00
Konstantinos Margaritis
ef9116b52e [VSX] optimize and correct lshift_m128/rshift_m128 2022-09-06 18:48:19 +03:00
Konstantinos Margaritis
6dce55c3fe [VSX] optimized mask1bit128(), moved simd_onebit_masks to common 2022-09-06 18:10:55 +03:00
Danila Kutenin
2dd7b9a4f9 Fix ppc64el debug 2022-06-26 23:05:17 +00:00
Danila Kutenin
45fe139224 Minor fix 2022-06-26 23:02:02 +00:00
Danila Kutenin
4b83ea1c78 Fix formatting of a couple files 2022-06-26 22:59:58 +00:00
Danila Kutenin
eb7b0bb50c Optimize vectorscan for aarch64 by using shrn instruction
This optimization is based on the thread
https://twitter.com/Danlark1/status/1539344279268691970 and uses
shift right and narrow by 4 instruction https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/SHRN--SHRN2--Shift-Right-Narrow--immediate--

To achieve that, I needed to redesign a little movemask into comparemask
and have an additional step towards mask iteration. Our benchmarks
showed 10-15% improvement on average for long matches.
2022-06-26 22:55:45 +00:00
Daniel Kutenin
2360314f9d Optimized and correct version of movemask128 for ARM
Closes #99

https://gcc.godbolt.org/z/cTjKqzcvn

Previous version was not correct because movemask thought of having bytes 0xFF. We can fully match the semantics + do it faster with USRA instructions.

Re-submission to a develop branch
2022-04-18 13:37:53 +01:00
Konstantinos Margaritis
242a460115 minor fixes 2021-12-07 08:49:59 +00:00
Konstantinos Margaritis
b6ddf2b41c fix clang-release-arm compilation 2021-12-07 08:43:52 +00:00
Konstantinos Margaritis
f4ccc40c58 fix wrong castings for NEON 2021-12-06 21:35:51 +00:00
Konstantinos Margaritis
ef2bc5cfbc fix compilation with clang and some incomplete/wrong implementations for arm this time 2021-12-06 18:22:58 +00:00
Konstantinos Margaritis
d86e6bed69 fix build with clang, in particular VSX uses long long instead of int64_t, gcc allows this, clang does not 2021-12-02 18:01:00 +02:00
Konstantinos Margaritis
896d28845c bump base requirements to SSE4.2 2021-12-01 23:20:02 +02:00
Konstantinos Margaritis
959fea25f7 use __builtin_constant_p() instead for arm as well 2021-11-25 06:20:53 +00:00
Apostolos Tapsas
e655d76a01 *fix palignr implementation for VSX Release mode
*add unit test for palignr
*enable unit test building for Release mode
2021-11-24 15:03:49 +00:00
Apostolos Tapsas
bc2dcc317d found and solved very hard to track bug of intrinsic function palignr, that manifested only in Release builds and not Debug builds in a particular number of tests 2021-11-24 11:18:18 +00:00