Changes between Version 16 and Version 17 of MatrixMultiply


Ignore:
Timestamp:
Jun 7, 2010 10:37:33 PM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MatrixMultiply

    v16 v17  
    22We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A and other is input matrix B in normal format. Output matrix C is also not transposed. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 
    33 
    4 Peformance Summary (Pmax & MADD in GFLOPS) 
     4== Peformance Summary == 
    55|| board  || Pmax || Nmax || prec  || GPR || MADD peak || 
    66|| HD4850 || 736    || 3328 || SP || 25 || 1040 || 
     
    88|| HD4850 || 160    || 1920 || DP || 17 || 208 || 
    99|| HD5870 || 431    || 2432 || DP || 17 || 544 || 
     10Pmax & MADD in GFLOPS 
    1011 
    1112== Source code ==