23 Commits

Author SHA1 Message Date
Chang, Harry
56cb107005 AVX512VBMI Fat Teddy. 2021-01-25 14:13:13 +02:00
Konstantinos Margaritis
5333467249 fix names, use own intrinsic instead of explicit _mm* ones 2020-09-23 11:51:21 +03:00
Chang, Harry
e665e959a0 Revert to AVX2 Fat Teddy instead of AVX512 reinforced Fat Teddy. 2020-05-25 13:47:53 +00:00
Chang, Harry
72d21a9acf Refactored building reinforcement table at compile time and updated comments. 2017-08-21 11:14:59 +10:00
Chang, Harry
2b1d3383aa replace "_avx2" with "_fat". 2017-08-21 11:14:59 +10:00
Chang, Harry
8da2d13baa AVX512 Reinforced FAT teddy. 2017-08-21 11:14:59 +10:00
Chang, Harry
d2b5523dd8 fix typo "ones_u32a" => "ones_u32" 2017-08-21 11:12:36 +10:00
Chang, Harry
68e08d8e18 AVX512 reinforced teddy. 2017-08-21 11:12:36 +10:00
Chang, Harry
dbd3f66e87 Reinforced Teddy with 1-byte approach, based on "shift-or" and AVX2. 2017-08-21 11:10:11 +10:00
Justin Viiret
b126cbf556 fdr/teddy: simplify computing of confirm base 2017-08-21 10:39:00 +10:00
Justin Viiret
4f32a167d5 teddy: align major structures to cachelines 2017-08-21 10:38:59 +10:00
Matthew Barr
a295c96198 rename vpshufb to pshufb_m256 2017-05-30 13:59:23 +10:00
Matthew Barr
8201183138 Check compiler architecture flags in one place 2017-04-26 15:18:26 +10:00
Alex Coyte
8af4850d85 remove 'fast teddy' models 2017-04-26 14:43:43 +10:00
Wang, Xiang W
df7bc22ae0 fdr: remove confirm split and pull-back 2017-04-26 14:43:09 +10:00
Justin Viiret
3d9a60d023 teddy: apply poison mask after prep_conf_ work
This simplifies the code, and removes all the all-ones p_mask uses,
which we were otherwise trusting the optimizer to remove.
2016-08-10 15:05:23 +10:00
Justin Viiret
9346a9090e fdr: remove groups from struct FDR_Runtime_Args 2016-08-10 14:55:52 +10:00
Justin Viiret
42f23c2c91 teddy: no need to write control out at the end 2016-08-10 14:55:51 +10:00
Justin Viiret
b6a77b7329 teddy: remove extra control ptr 2016-08-10 14:55:51 +10:00
Matthew Barr
e3d416a6ea Apply some consistency to the names we give shifts 2016-07-08 11:07:50 +10:00
Matthew Barr
1b3e795fc9 teddy: we only need the upper lane
Just use an extract, no need to shuffle first.
2016-07-08 11:07:50 +10:00
Matthew Barr
4d6934fc77 Move limex specific shuffle utils and ssse3 funcs 2016-07-08 11:07:50 +10:00
Mohammad Abdul Awal
ed772380c0 teddy: remove python codegen, refactor code
Major cleanup of the Teddy runtime code. Removes python code generation,
splits AVX2 models into their own file, improves readability.
2016-05-18 16:28:11 +10:00