gtsoul-tech
9df8527e91
variableScope
2024-04-29 13:13:07 +03:00
Konstantinos Margaritis
a4d1779945
Merge pull request #225 from VectorCamp/feature/cleanup-compiler-warnings
...
According to https://buildbot-ci.vectorcamp.gr/#/changes/93
most builds succceded and with no compiler warnings. The build failures were only on x86 and Arm for SIMDe builds: x86 because of a bug in SIMDe emulation of own x86 intrinsics in non-native mode and Arm due to clang, unsure if this is actually a bug in SIMDe or clang itself. All the remaining compiler warnings that were suppressed was because they were not possible to fix for the scope of this project.
This PR will close #170 , code quality improvements however will continue with the integration of #222 or similar static code analyzer to CI and continuous refactoring.
2024-01-20 22:41:00 +02:00
Yoan Picchi
6652d4a837
Make the match component of SVE truffle constant time
...
There are no significant speed up for 128b vectors but we expect some speedup
for wider vectors compared to the previous linear time implementation of the
match.hpp component
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com >
2024-01-18 11:53:45 +00:00
Konstantinos Margaritis
fdc067861e
check the correct define
2024-01-18 00:41:56 +02:00
Yoan Picchi
c67076ce22
Add truffle SVE implementation
...
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com >
2024-01-09 16:50:03 +00:00
Konstantinos Margaritis
50675d0af6
add fallback pdep64 for x86 if no HAVE_BMI2
2023-12-20 08:25:30 +02:00
Konstantinos Margaritis
192bf38d56
add missing pdep64 for x86 bitutils
2023-12-20 00:12:15 +02:00
Konstantinos Margaritis
38231b2a5e
add missing pdep64 for arm and ppc64le
2023-12-19 23:15:27 +02:00
Konstantinos Margaritis
5cb3a69edc
make diffrich384 available on all arches
2023-11-28 12:06:46 +00:00
Konstantinos Margaritis
64d106e582
fix compilation for SIMDe
2023-11-27 20:52:52 +00:00
Konstantinos Margaritis
1fb601f3a9
fix SIMDe emulation builds on Arm, add native translation from x86 for comparison
2023-11-27 12:21:58 +00:00
Konstantinos Margaritis
b0d9c7f879
existing scalar implementations were incorrect -but never tested, ported from arm/ppc64le
2023-11-23 16:09:10 +00:00
Konstantinos Margaritis
9cf061b89b
add missing intrinsics for SIMDe backend
2023-11-23 16:08:26 +00:00
Konstantinos Margaritis
99807c17a6
enable SIMDe backend
2023-11-21 17:13:33 +00:00
Konstantinos Margaritis
50a664b5c3
add SIMDe ports of simd_utils and supervector
2023-11-21 17:12:04 +00:00
Konstantinos Margaritis
1ca4dc8b39
Ubuntu 20.04 gcc does not define HWCAP2_SVE2 #180
2023-10-10 18:30:12 +08:00
Konstantinos Margaritis
3f9c05d57f
fix cmake refactor for arm builds
2023-10-09 10:03:53 +00:00
Konstantinos Margaritis
abcc974d1d
add missing file
2023-10-07 12:10:42 +03:00
Konstantinos Margaritis
4ae1aebc1b
use the conditional in the right way
2023-10-04 20:35:58 +03:00
Konstantinos Margaritis
bfe1aa52f1
add conditional for __clang__
2023-10-04 20:28:35 +03:00
Konstantinos Margaritis
b5d87d3877
clang 15 (but not 16) fails on ppc64le with -Wdeprecate-lax-vec-conv-all
2023-10-04 20:09:45 +03:00
Konstantinos Margaritis
89a85a8e90
HWCAP is only available on Linux
2023-09-08 10:08:44 +03:00
Konstantinos Margaritis
394d09fe45
initial attempt for fat binary on Aarch64
2023-08-23 09:42:00 +00:00
Konstantinos Margaritis
1e3b031dee
prefix assume_aligned to avoid clash with std::assume_aligned in c++20
2022-11-01 10:29:22 +00:00
Konstantinos Margaritis
8a6add2fb6
[VSX] movemask needs to be explicitly aligned on clang for vec_ste
2022-09-16 12:50:33 +03:00
Konstantinos Margaritis
4b41c5fe25
[NEON] simplify/optimize shift/align primitives
2022-09-12 13:09:51 +00:00
Konstantinos Margaritis
a0e53c7d85
use correct intrinsic for lshiftbyte_m128
2022-09-07 16:00:10 +03:00
Konstantinos Margaritis
37b2cae189
provide non-immediate versions of lshiftbyte/rshiftbyte on x86
2022-09-07 15:07:20 +03:00
Konstantinos Margaritis
ce90e58af1
readd simd_onebit_masks for x86, needs more work
2022-09-07 13:42:25 +03:00
Konstantinos Margaritis
0052df5f5b
[NEON] optimize mask1bit128, get rid of simd_onebit_masks
2022-09-07 10:20:01 +00:00
Konstantinos Margaritis
76a31d1bc0
remove simd_onebit_masks from arm/x86 headers, as they moved to common
2022-09-07 12:41:32 +03:00
Konstantinos Margaritis
c097f169ad
[VSX] add algorithm for alignr w/o use of immediates
2022-09-07 00:01:54 +03:00
Konstantinos Margaritis
bdc3947746
[VSX] correct lshiftbyte_m128/rshiftbyte_m128, variable_byte_shift
2022-09-06 23:59:51 +03:00
Konstantinos Margaritis
59ace0ebf8
[VSX] huge optimization of movemask128
2022-09-06 20:08:44 +03:00
Konstantinos Margaritis
ef9116b52e
[VSX] optimize and correct lshift_m128/rshift_m128
2022-09-06 18:48:19 +03:00
Konstantinos Margaritis
6dce55c3fe
[VSX] optimized mask1bit128(), moved simd_onebit_masks to common
2022-09-06 18:10:55 +03:00
Danila Kutenin
2dd7b9a4f9
Fix ppc64el debug
2022-06-26 23:05:17 +00:00
Danila Kutenin
45fe139224
Minor fix
2022-06-26 23:02:02 +00:00
Danila Kutenin
4b83ea1c78
Fix formatting of a couple files
2022-06-26 22:59:58 +00:00
Danila Kutenin
eb7b0bb50c
Optimize vectorscan for aarch64 by using shrn instruction
...
This optimization is based on the thread
https://twitter.com/Danlark1/status/1539344279268691970 and uses
shift right and narrow by 4 instruction https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/SHRN--SHRN2--Shift-Right-Narrow--immediate--
To achieve that, I needed to redesign a little movemask into comparemask
and have an additional step towards mask iteration. Our benchmarks
showed 10-15% improvement on average for long matches.
2022-06-26 22:55:45 +00:00
Daniel Kutenin
2360314f9d
Optimized and correct version of movemask128 for ARM
...
Closes #99
https://gcc.godbolt.org/z/cTjKqzcvn
Previous version was not correct because movemask thought of having bytes 0xFF. We can fully match the semantics + do it faster with USRA instructions.
Re-submission to a develop branch
2022-04-18 13:37:53 +01:00
Konstantinos Margaritis
242a460115
minor fixes
2021-12-07 08:49:59 +00:00
Konstantinos Margaritis
b6ddf2b41c
fix clang-release-arm compilation
2021-12-07 08:43:52 +00:00
Konstantinos Margaritis
f4ccc40c58
fix wrong castings for NEON
2021-12-06 21:35:51 +00:00
Konstantinos Margaritis
ef2bc5cfbc
fix compilation with clang and some incomplete/wrong implementations for arm this time
2021-12-06 18:22:58 +00:00
Konstantinos Margaritis
d86e6bed69
fix build with clang, in particular VSX uses long long instead of int64_t, gcc allows this, clang does not
2021-12-02 18:01:00 +02:00
Konstantinos Margaritis
896d28845c
bump base requirements to SSE4.2
2021-12-01 23:20:02 +02:00
Konstantinos Margaritis
959fea25f7
use __builtin_constant_p() instead for arm as well
2021-11-25 06:20:53 +00:00
Apostolos Tapsas
e655d76a01
*fix palignr implementation for VSX Release mode
...
*add unit test for palignr
*enable unit test building for Release mode
2021-11-24 15:03:49 +00:00
Apostolos Tapsas
bc2dcc317d
found and solved very hard to track bug of intrinsic function palignr, that manifested only in Release builds and not Debug builds in a particular number of tests
2021-11-24 11:18:18 +00:00