Changes between Version 27 and Version 28 of MatrixMultiply


Ignore:
Timestamp:
Jun 23, 2010 5:21:17 PM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MatrixMultiply

    v27 v28  
    22We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams for computing C=AB. One is transposed input matrix A (i.e. column major) and other is input matrix B in normal format (i.e. row major). Output matrix C is also row major. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 
    33 
    4 Add double-double precision performance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction. 
     4Add double-double (DD) precision performance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction. For MAD peak in DD, we assume one DD operation takes 20 DP operations(ops) without FMA and 15 ops with FMA. Precicely, DD add takes ~ 20 ops and DD mul wihtou FMA takes ~ 20. DD mul with FMA takes ~ 8 ops.  
    55 
    66== Peformance Summary == 
    7 || board  || Pmax || Nmax || prec  || reg. usage || MAD peak || 
    8 || HD4850 || 736    || 3328 || SP || 25 || 1040 || 
    9 || HD5870 || 2140   || 7424 || SP || 25 || 2720 || 
    10 || HD4850 || 177    || 1408 || DP || 19 || 208 || 
    11 || HD5870 || 475    || 2048 || DP || 19 || 544 || 
    12 || HD4850 || 7.5    || 768  || DDP || 21 ||  || 
    13 || HD5870 || 20     || 1024 || DDP || 21 ||  || 
    14 || HD5870 FMA || 31     || 1024 || DDP || 18 ||   || 
     7|| board  || Pmax || Nmax || prec  || reg. usage || MAD peak || note || 
     8|| HD4850 || 736    || 3328 || SP || 25 || 1040 |||| 
     9|| HD5870 || 2140   || 7424 || SP || 25 || 2720 |||| 
     10|| HD4850 || 177    || 1408 || DP || 19 || 208 |||| 
     11|| HD5870 || 475    || 2048 || DP || 19 || 544 |||| 
     12|| HD4850 || 7.5    || 768  || DDP || 21 || ~10.4 |||| 
     13|| HD5870 || 20     || 1024 || DDP || 21 || ~27.2 |||| 
     14|| HD5870 || 31     || 1024 || DDP || 18 || ~36.2 || FMA || 
     15 
     16 
    1517 
    1618Pmax & MAD in GFLOPS