Changes between Version 26 and Version 27 of MatrixMultiply
- Timestamp:
- Jun 23, 2010 5:15:38 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
MatrixMultiply
v26 v27 2 2 We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams for computing C=AB. One is transposed input matrix A (i.e. column major) and other is input matrix B in normal format (i.e. row major). Output matrix C is also row major. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 3 3 4 Add double-double p erformance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction.4 Add double-double precision performance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction. 5 5 6 6 == Peformance Summary ==