Changes between Version 26 and Version 27 of MatrixMultiply


Ignore:
Timestamp:
Jun 23, 2010 5:15:38 PM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MatrixMultiply

    v26 v27  
    22We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams for computing C=AB. One is transposed input matrix A (i.e. column major) and other is input matrix B in normal format (i.e. row major). Output matrix C is also row major. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 
    33 
    4 Add double-double performance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction. 
     4Add double-double precision performance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction. 
    55 
    66== Peformance Summary ==