7 | | The kd-tree is a fundamental tool in computer science. |
8 | | Among other applications, the application of kd-tree search (by the |
9 | | tree method) to the fast evaluation of particle interactions and |
10 | | neighbor search is highly important, since the computational |
11 | | complexity of these problems is reduced from O(N^2^) for a brute |
12 | | force method to O(N log N) for the tree method, where N is |
13 | | the number of particles. |
14 | | In this paper, we present a parallel |
15 | | implementation of the tree method running on a graphics processing |
16 | | unit (GPU). We present a detailed description of how we have |
17 | | implemented the tree method on a Cypress GPU. |
18 | | An optimization that we found important is localized particle ordering to |
19 | | effectively utilize cache memory. |
20 | | We present a number of test results |
21 | | and performance measurements. |
22 | | Our results show that the execution of the tree traversal |
23 | | in a force calculation on a GPU is practical and efficient. |
| 7 | The kd-tree is a fundamental tool in computer science. Among other applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighbor search is highly important, since the computational complexity of these problems is reduced from O(N^2^) for a brute |
| 8 | force method to O(N log N) for the tree method, where N is the number of particles. In this paper, we present a parallel implementation of the tree method running on a graphics processing unit (GPU). We present a detailed description of how we have implemented the tree method on a Cypress GPU. An optimization that we found important is localized particle ordering to effectively utilize cache memory. We present a number of test results and performance measurements. |
| 9 | Our results show that the execution of the tree traversal in a force calculation on a GPU is practical and efficient. |