Changes between Version 2 and Version 3 of Fast_GEMM_implementation_On_Cypress


Ignore:
Timestamp:
Oct 11, 2010 8:57:49 AM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Fast_GEMM_implementation_On_Cypress

    v2 v3  
    1515with one GPU chip to our knowledge. 
    1616Furthermore, the performance of our matrix multiply kernel in DDP is 31 Gflop/s. 
    17 It is more than 200 times faster than the performance  
    18 results on single core of a recent CPU (with mpack version 0.6.5). 
     17This performance in DDP is more than 200 times faster than the performance  
     18in DDP on single core of a recent CPU (with mpack version 0.6.5). 
    1919We describe our GEMM kernels with main focus on the SGEMM implementation 
    2020since all GEMM kernels share common programming and optimization techniques.