Changes between Version 24 and Version 25 of MatrixMultiply
- Timestamp:
- Jun 23, 2010 4:47:21 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
MatrixMultiply
v24 v25 1 1 = Matrix Multiply on GPU = 2 2 We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams for computing C=AB. One is transposed input matrix A (i.e. column major) and other is input matrix B in normal format (i.e. row major). Output matrix C is also row major. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result for each case. Note only kernel execution time is measured. 3 4 Add double-double performance. We used 2x2 block in this case. On Cypress architecture GPU, we take advantage of FMA_64 instruction. 3 5 4 6 == Peformance Summary == … … 8 10 || HD4850 || 177 || 1408 || DP || 19 || 208 || 9 11 || HD5870 || 475 || 2048 || DP || 19 || 544 || 12 || HD4850 || 7.5 || 768 || DD || 21 || || 13 || HD5870 || 20 || 1024 || DD || 21 || || 14 || HD5870 FMA || 31 || 1024 || DD || 18 || || 15 10 16 Pmax & MAD in GFLOPS 11 17 … … 18 24 == Double precision == 19 25 [[Image(DMM.png)]] 26 27 == Double-Double == 28 [[Image(DDMM.png)]] 29 20 30 21 31 = Useful forum discussions =