3 | | The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N**2) computational complexity. In this paper, we show how to speed-up the gravitational many-body problem by using GPU. After extensive optimizations, the peak performance obtained so far is about 1 Tflops. |
| 3 | The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N^2) computational complexity. |
| 4 | In this paper, we report our technique to speed-up the exact force-calculation on RV770 GPU from AMD/ATi. |
| 5 | As far as we know, our implementation on RV770 GPU running at 750 MHz shows fastest |
| 6 | performance of ~ 1 Tflops thanks to efficient cache architecture of RV770 GPU. |
| 7 | This is accomplished a loop-unrolling technique that is highly effective RV770 GPU. |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 19 | == abstract == |
| 20 | The kd-tree is a fundamental tool in computer science. |
| 21 | Among others, an application of the kd-tree search |
| 22 | (oct-tree method) to fast evaluation of particle interactions |
| 23 | and neighbor search is highly important |
| 24 | since computational complexity of these problems |
| 25 | are reduced from O(N^2^) with a brute force method |
| 26 | to O(N log N) with the tree method where N is a number of particles. |
| 27 | In this paper, we present a parallel implementation of |
| 28 | the tree method running on a graphic processor unit (GPU). |
| 29 | We successfully run a simulation of structure formation |
| 30 | in the universe very efficiently. |
| 31 | On our system, which costs roughly $900, |
| 32 | the run with N ~ 2.87 x 10^6^ particles |
| 33 | took 5.79 hours and executed 1.2 x 10^13^ force evaluations in total. |
| 34 | We obtained the sustained computing speed of 21.8 Gflops |
| 35 | and the cost per Gflops of $41.6/Gflops that is two and half times better than |
| 36 | the previous record in 2006. |
| 37 | |
| 38 | == preprint == |