Changes between Version 10 and Version 11 of MatrixMultiply
- Timestamp:
- Jun 7, 2010 10:14:15 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
MatrixMultiply
v10 v11 1 1 = Matrix Multiply on GPU = 2 We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A and other is input matrix B in normal format. Output matrix C is also not transposed. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result :2 We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A and other is input matrix B in normal format. Output matrix C is also not transposed. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 3 3 4 4 == Single precision ==