Version 12 (modified by nakasato, 14 years ago) (diff) |
---|
Our implementation of MM on GPU
See MatrixMultiply.
Introduction
We have tested the ACML-GPU version 1.1. We used Cypress GPU running at 850MHz for our tests. The operating system we adopted is Ubuntu 10.4 LTS. Note we presented two lines for each plot (DP and SP) below. One is the result obtained with X58 chipset and other with X38 chipset. PCI transfer speed of Cypress with X58 chipset shows rather limited performance of roughly 600 MB/sec for GPU to CPU. X38 chips shows reasonable speed of > 6 GB/sec for large data. However, it seems that the transfer speed is not critical for GEMM benchmark.
Update(20100808): The previous figure for DGEMM shows wrong results for NN. The correct performance number is slightly slower than the previous data.
Results
Attachments (3)
- SGEMM.png (6.8 KB) - added by nakasato 14 years ago.
- DGEMM.png (7.6 KB) - added by nakasato 14 years ago.
- DGEMM_ab.png (12.1 KB) - added by nakasato 14 years ago.
Download all attachments as: .zip