= Fast Simulations of Gravitational Many-body Problem on RV770 GPU = == abstract == The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N^2^) computational complexity. In this paper, we report our technique to speed-up the exact force-calculation on RV770 GPU from AMD/ATi. As far as we know, our implementation on RV770 GPU running at 750 MHz shows fastest performance of ~ 1 Tflops thanks to efficient cache architecture of RV770 GPU. This is accomplished by a simple loop-unrolling technique that is highly effective RV770 GPU. == Result == [[Image(bare.png)]] == preprint == http://jp.arxiv.org/abs/0904.3659 = Oct-tree Method on GPU: $42/Gflops Cosmological Simulation = == abstract == The kd-tree is a fundamental tool in computer science. Among others, an application of the kd-tree search (oct-tree method) to fast evaluation of particle interactions and neighbor search is highly important since computational complexity of these problems are reduced from O(N^2^) with a brute force method to O(N log N) with the tree method where N is a number of particles. In this paper, we present a parallel implementation of the tree method running on a graphic processor unit (GPU). We successfully run a simulation of structure formation in the universe very efficiently. On our system, which costs roughly $900, the run with N ~ 2.87 x 10^6^ particles took 5.79 hours and executed 1.2 x 10^13^ force evaluations in total. We obtained the sustained computing speed of 21.8 Gflops and the cost per Gflops of $41.6/Gflops that is two and half times better than the previous record in 2006. == Result == [[Image(time.png)]] == preprint == to be posted = Demo programs = to be posted