Changes between Version 18 and Version 19 of GEMM_Performance_Cypress
- Timestamp:
- Aug 12, 2010 6:22:07 AM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GEMM_Performance_Cypress
v18 v19 9 9 Update(20100808): The previous figure for DGEMM shows wrong results. The correct performance of our DGEMM is slightly slower than the previous data. "NN" is my initial not matrix transpose options. This result is for At B. We are currently working on other options. And will work on scalar constants. 10 10 11 Comment(20100812): With the latest Catalyst 10.7, slow transfer speed for GPU to CPU memory is gone. We have roughly ~ 6 GB/sec for both direction. However, it just represents practical maximum transfer speed with a combination of Cypress and X58 chipset. The actual transfer speed from/to the main memory is rather slow ~ 2.5 GB/sec at most with the pinned memory allocation. This is the problem! 12 11 13 = Results = 12 14 [[Image(DGEMM.png)]]