Changes between Version 18 and Version 19 of MatrixMultiply


Ignore:
Timestamp:
Jun 7, 2010 11:56:43 PM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MatrixMultiply

    v18 v19  
    11= Matrix Multiply on GPU = 
    2 We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A and other is input matrix B in normal format. Output matrix C is also not transposed. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 
     2We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A (i.e. column major) and other is input matrix B in normal format (i.e. row major). Output matrix C is also row major. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 
    33 
    44== Peformance Summary == 
    5 || board  || Pmax || Nmax || prec  || GPR || MADD peak || 
     5|| board  || Pmax || Nmax || prec  || reg. usage || MADD peak || 
    66|| HD4850 || 736    || 3328 || SP || 25 || 1040 || 
    77|| HD5870 || 2140   || 7424 || SP || 25 || 2720 ||