Changes between Version 24 and Version 25 of MatrixMultiply


Ignore:
Timestamp:
Jun 23, 2010 4:47:21 PM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MatrixMultiply

    v24 v25  
    11= Matrix Multiply on GPU = 
    22We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams for computing C=AB. One is transposed input matrix A (i.e. column major) and other is input matrix B in normal format (i.e. row major). Output matrix C is also row major. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 
     3 
     4Add double-double performance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction. 
    35 
    46== Peformance Summary == 
     
    810|| HD4850 || 177    || 1408 || DP || 19 || 208 || 
    911|| HD5870 || 475    || 2048 || DP || 19 || 544 || 
     12|| HD4850 || 7.5    || 768  || DD || 21 ||  || 
     13|| HD5870 || 20     || 1024 || DD || 21 ||  || 
     14|| HD5870 FMA || 31     || 1024 || DD || 18 ||   || 
     15 
    1016Pmax & MAD in GFLOPS 
    1117 
     
    1824== Double precision == 
    1925[[Image(DMM.png)]] 
     26 
     27== Double-Double == 
     28[[Image(DDMM.png)]] 
     29 
    2030 
    2131= Useful forum discussions =