wiki:Astronomical_Many_Body_Simulations_On_RV770

Context Navigation

Fast Simulations of Gravitational Many-body Problem on RV770 GPU
Demo Program
Oct-tree Method on GPU: $42/Gflops Cosmological Simulation

Fast Simulations of Gravitational Many-body Problem on RV770 GPU

by K.Fujiwara and N.Nakasato (Based on Fujiwara's undergraduate thesis 2008 University of Aizu)

abstract

The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N²) computational complexity. In this paper, we report our technique to speed-up the exact force-calculation on RV770 GPU from AMD/ATi. Our implementation on RV770 GPU running at 750 MHz shows performance of ~ 1 Tflops thanks to efficient cache architecture of RV770 GPU. This significant performance result is fastest ever as far as we know. Our optimized result is realized by a loop-unrolling technique that is highly effective for RV770 GPU.

Result

preprint

http://jp.arxiv.org/abs/0904.3659

Recent test results with RV870

I've created a new page dedicated to Radeon 5870.

Demo Program

A demo program of our many-body simulation code. To control the program, see a message printed on the command window. It will takes 15-20 seconds to finish one simulation then it will print the sustained perfromance of the simulation. I attach five different initial models with different N. Press 1-5 to change the model. Larger N, the better performance.

Download
- R700demo.zip

System Requirement
- Windows XP/Vista (both 32 and 64 bit version works)
- Catalyst version 9.4 or later
- A supported GPU board (both R600 and R700 GPU works)

Premilinary Benchmark Results link to google doc
- Note the benchmark results compiled here were obtained with the "figure eight N = 10k" model.

Have a fun!

R700demo was originally developed by K.Fujiwara. N.Nakasato has modifed it to support our latest code with minior enhancements.

Oct-tree Method on GPU: $42/Gflops Cosmological Simulation

by N.Nakasato (submitted April 14 2009)

abstract

The kd-tree is a fundamental tool in computer science. Among others, an application of the kd-tree search (oct-tree method) to fast evaluation of particle interactions and neighbor search is highly important since computational complexity of these problems is reduced from O(N²) with a brute force method to O(N log N) with the tree method where N is a number of particles. In this paper, we present a parallel implementation of the tree method running on a graphic processor unit (GPU). We successfully run a simulation of structure formation in the universe very efficiently. On our system, which costs roughly $900, the run with N ~ 2.87 x 10⁶ particles took 5.79 hours and executed 1.2 x 10¹³ force evaluations in total. We obtained the sustained computing speed of 21.8 Gflops and the cost per Gflops of $41.6/Gflops that is two and half times better than the previous record in 2006.

Result

preprint

http://arxiv.org/abs/0909.0541

Last modified 16 years ago Last modified on Oct 5, 2009 3:58:51 PM

Attachments (5)

bare.png (24.3 KB) - added by nakasato 16 years ago.
time.png (17.3 KB) - added by nakasato 16 years ago.
screenshot.png (220.6 KB) - added by nakasato 16 years ago.
R700demo.zip (3.4 MB) - added by nakasato 16 years ago.
OpenCL.png (4.0 KB) - added by nakasato 15 years ago.

Download in other formats:

Plain Text