wiki:MatrixMultiply

Version 10 (modified by nakasato, 14 years ago) (diff)

--

Matrix Multiply on GPU

We have implemented single/double precision matrix multiply program for RV770/Cypress. In our implementation, we use two input streams. One is transposed input matrix A and other is input matrix B in normal format. Output matrix C is also not transposed. We adopted 8x8 block for single precision and 4x4 for double precision. Here is benchmark result:

Single precision

Double precision

Useful forum discussions

Discussion on a highly optimized MM kernel

http://forum.beyond3d.com/showthread.php?t=54842

Discussion on MM kernels in OpenCL

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=127963

IL code generator in C++

CAL++ http://sourceforge.net/projects/calpp/

Meta-programing works in reality.

Attachments (6)

Download all attachments as: .zip