Changes between Version 2 and Version 3 of Astronomical_Many_Body_Simulations_On_RV770


Ignore:
Timestamp:
May 21, 2009 12:31:54 PM (16 years ago)
Author:
nakasato
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Astronomical_Many_Body_Simulations_On_RV770

    v2 v3  
    11= Fast Simulations of Gravitational Many-body Problem on RV770 GPU = 
    22== abstract == 
    3 The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N**2) computational complexity. In this paper, we show how to speed-up the gravitational many-body problem by using GPU. After extensive optimizations, the peak performance obtained so far is about 1 Tflops.  
     3The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N^2) computational complexity.  
     4In this paper, we report our technique to speed-up the exact force-calculation on RV770 GPU from AMD/ATi. 
     5As far as we know, our implementation on RV770 GPU running at 750 MHz shows fastest 
     6performance of ~ 1 Tflops thanks to efficient cache architecture of RV770 GPU.  
     7This is accomplished a loop-unrolling technique that is highly effective RV770 GPU. 
     8 
     9 
     10 
     11 
     12 
    413 
    514== preprint == 
     
    817 
    918= Oct-tree on GPU = 
     19== abstract == 
     20The kd-tree is a fundamental tool in computer science. 
     21Among others, an application of the kd-tree search 
     22(oct-tree method) to fast evaluation of particle interactions 
     23and neighbor search is highly important 
     24since computational complexity of these problems 
     25are reduced from O(N^2^) with a brute force method 
     26to O(N log N) with the tree method where N is a number of particles. 
     27In this paper, we present a parallel implementation of 
     28the tree method running on a graphic processor unit (GPU). 
     29We successfully run a simulation of structure formation 
     30in the universe very efficiently. 
     31On our system, which costs roughly $900, 
     32the run with N ~ 2.87 x 10^6^ particles 
     33took 5.79 hours and executed 1.2 x 10^13^ force evaluations in total. 
     34We obtained the sustained computing speed of 21.8 Gflops 
     35and the cost per Gflops of $41.6/Gflops that is two and half times better than  
     36the previous record in 2006. 
     37 
     38== preprint ==  
    1039to be posted soon