Changes between Version 45 and Version 46 of MatrixMultiply


Ignore:
Timestamp:
Aug 21, 2010 9:39:26 AM (14 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MatrixMultiply

    v45 v46  
    33 
    44Update: we put double-double (DD) precision performance. In this case, we used 2x2 block. On Cypress architecture GPU, we take advantage of FMA_64 instruction. For MAD peak in DD, we assume one DD operation takes 20 DP operations(ops) without FMA and 15 ops with FMA. Precisely, DD add and DD mul without FMA takes ~ 20 ops while DD mul with FMA only takes ~ 8 ops. Even without FMA_64 instruction, we can use MULADD instruction to reduce op count in DD mul. On RV770, we have 13% better performance as indicated with the row with MAD. 
     5 
     6See [wiki:"GEMM_Performance_Cypress"] for our GEMM implementation. 
    57 
    68== Performance Summary ==