Changes between Version 2 and Version 3 of Fast_GEMM_implementation_On_Cypress
- Timestamp:
- Oct 11, 2010 8:57:49 AM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Fast_GEMM_implementation_On_Cypress
v2 v3 15 15 with one GPU chip to our knowledge. 16 16 Furthermore, the performance of our matrix multiply kernel in DDP is 31 Gflop/s. 17 Itis more than 200 times faster than the performance18 resultson single core of a recent CPU (with mpack version 0.6.5).17 This performance in DDP is more than 200 times faster than the performance 18 in DDP on single core of a recent CPU (with mpack version 0.6.5). 19 19 We describe our GEMM kernels with main focus on the SGEMM implementation 20 20 since all GEMM kernels share common programming and optimization techniques.