Changes between Version 16 and Version 17 of MatrixMultiply
- Timestamp:
- Jun 7, 2010 10:37:33 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
MatrixMultiply
v16 v17 2 2 We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A and other is input matrix B in normal format. Output matrix C is also not transposed. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 3 3 4 Peformance Summary (Pmax & MADD in GFLOPS) 4 == Peformance Summary == 5 5 || board || Pmax || Nmax || prec || GPR || MADD peak || 6 6 || HD4850 || 736 || 3328 || SP || 25 || 1040 || … … 8 8 || HD4850 || 160 || 1920 || DP || 17 || 208 || 9 9 || HD5870 || 431 || 2432 || DP || 17 || 544 || 10 Pmax & MADD in GFLOPS 10 11 11 12 == Source code ==