Konstantinos Margaritis
|
27bd09454f
|
use correct function names for AVX512, fix build failure
|
2021-02-15 13:54:19 +02:00 |
|
Chang, Harry
|
56cb107005
|
AVX512VBMI Fat Teddy.
|
2021-01-25 14:13:13 +02:00 |
|
Konstantinos Margaritis
|
5333467249
|
fix names, use own intrinsic instead of explicit _mm* ones
|
2020-09-23 11:51:21 +03:00 |
|
Chang, Harry
|
e665e959a0
|
Revert to AVX2 Fat Teddy instead of AVX512 reinforced Fat Teddy.
|
2020-05-25 13:47:53 +00:00 |
|
Chang, Harry
|
72d21a9acf
|
Refactored building reinforcement table at compile time and updated comments.
|
2017-08-21 11:14:59 +10:00 |
|
Chang, Harry
|
2b1d3383aa
|
replace "_avx2" with "_fat".
|
2017-08-21 11:14:59 +10:00 |
|
Chang, Harry
|
8da2d13baa
|
AVX512 Reinforced FAT teddy.
|
2017-08-21 11:14:59 +10:00 |
|
Chang, Harry
|
d2b5523dd8
|
fix typo "ones_u32a" => "ones_u32"
|
2017-08-21 11:12:36 +10:00 |
|
Chang, Harry
|
68e08d8e18
|
AVX512 reinforced teddy.
|
2017-08-21 11:12:36 +10:00 |
|
Chang, Harry
|
dbd3f66e87
|
Reinforced Teddy with 1-byte approach, based on "shift-or" and AVX2.
|
2017-08-21 11:10:11 +10:00 |
|
Justin Viiret
|
b126cbf556
|
fdr/teddy: simplify computing of confirm base
|
2017-08-21 10:39:00 +10:00 |
|
Justin Viiret
|
4f32a167d5
|
teddy: align major structures to cachelines
|
2017-08-21 10:38:59 +10:00 |
|
Matthew Barr
|
a295c96198
|
rename vpshufb to pshufb_m256
|
2017-05-30 13:59:23 +10:00 |
|
Matthew Barr
|
8201183138
|
Check compiler architecture flags in one place
|
2017-04-26 15:18:26 +10:00 |
|
Alex Coyte
|
8af4850d85
|
remove 'fast teddy' models
|
2017-04-26 14:43:43 +10:00 |
|
Wang, Xiang W
|
df7bc22ae0
|
fdr: remove confirm split and pull-back
|
2017-04-26 14:43:09 +10:00 |
|
Justin Viiret
|
3d9a60d023
|
teddy: apply poison mask after prep_conf_ work
This simplifies the code, and removes all the all-ones p_mask uses,
which we were otherwise trusting the optimizer to remove.
|
2016-08-10 15:05:23 +10:00 |
|
Justin Viiret
|
9346a9090e
|
fdr: remove groups from struct FDR_Runtime_Args
|
2016-08-10 14:55:52 +10:00 |
|
Justin Viiret
|
42f23c2c91
|
teddy: no need to write control out at the end
|
2016-08-10 14:55:51 +10:00 |
|
Justin Viiret
|
b6a77b7329
|
teddy: remove extra control ptr
|
2016-08-10 14:55:51 +10:00 |
|
Matthew Barr
|
e3d416a6ea
|
Apply some consistency to the names we give shifts
|
2016-07-08 11:07:50 +10:00 |
|
Matthew Barr
|
1b3e795fc9
|
teddy: we only need the upper lane
Just use an extract, no need to shuffle first.
|
2016-07-08 11:07:50 +10:00 |
|
Matthew Barr
|
4d6934fc77
|
Move limex specific shuffle utils and ssse3 funcs
|
2016-07-08 11:07:50 +10:00 |
|
Mohammad Abdul Awal
|
ed772380c0
|
teddy: remove python codegen, refactor code
Major cleanup of the Teddy runtime code. Removes python code generation,
splits AVX2 models into their own file, improves readability.
|
2016-05-18 16:28:11 +10:00 |
|