Changes between Version 12 and Version 13 of Fastest_GEMM_implementation_On_Cypress


Ignore:
Timestamp:
Oct 11, 2010 8:35:57 AM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Fastest_GEMM_implementation_On_Cypress

    v12 v13  
    1 = A fast GEMM implementation on a Cypress GPU = 
    2 by N.Nakasato (University of Aizu), submitted September 7, 2010. 
    3  
    4 This paper will be presented at 1st International Workshop on  
    5 Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10) 
    6 held as part of SC10, New Orleans, November 13-19, 2010 
    7  
    8 == abstract == 
    9 We present benchmark results of optimized dense matrix multiplication 
    10 kernels for a Cypress GPU. We write general matrix multiply (GEMM) kernels 
    11 for single (SP), double (DP) and double-double (DDP) precision.  
    12 Our SGEMM and DGEMM kernels show 73% and 87% of  
    13 the theoretical performance of the GPU, respectively. 
    14 Currently, our SGEMM and DGEMM kernels are fastest  
    15 with one GPU chip to our knowledge. 
    16 Furthermore, the performance of our matrix multiply kernel in DDP is 31 Gflop/s. 
    17 It is more than 200 times faster than the performance  
    18 results on single core of a recent CPU (with mpack version 0.6.5). 
    19 We describe our GEMM kernels with main focus on the SGEMM implementation 
    20 since all GEMM kernels share common programming and optimization techniques. 
    21 While a conventional wisdom of GPU programming recommends us  
    22 to heavily use shared memory on GPUs,   
    23 we show that texture cache is very effective on the Cypress architecture.  
    24  
    25 == preliminary results == 
    26  * [wiki:"GEMM_Performance_Cypress"] 
    27  * [wiki:"MatrixMultiply"] 
    28  
    29 == preprint == 
    30 Posted later. 
    31  
    32 == Sample program for DGEMM == 
    33  
    34  
     1See [wiki:"Fast_GEMM_implementation_On_Cypress"] since we slightly change the title ;-).