Version 10 (modified by nakasato, 14 years ago) (diff) |
---|
Matrix Multiply on GPU
We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A and other is input matrix B in normal format. Output matrix C is also not transposed. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result:
Single precision
Double precision
Useful forum discussions
Discussion on a highly optimized MM kernel
http://forum.beyond3d.com/showthread.php?t=54842
Discussion on MM kernels in OpenCL
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=127963
IL code generator in C++
CAL++ http://sourceforge.net/projects/calpp/
Meta-programing works in reality.
Attachments (6)
- MM1.png (12.7 KB) - added by nakasato 15 years ago.
- DMM.png (5.0 KB) - added by nakasato 14 years ago.
- SMM.png (5.3 KB) - added by nakasato 14 years ago.
- DDMM.png (5.2 KB) - added by nakasato 14 years ago.
- dis.txt (19.7 KB) - added by nakasato 14 years ago.
- kernel_single.il (2.8 KB) - added by nakasato 14 years ago.
Download all attachments as: .zip