| 7 | | The kd-tree is a fundamental tool in computer science. |
| 8 | | Among other applications, the application of kd-tree search (by the |
| 9 | | tree method) to the fast evaluation of particle interactions and |
| 10 | | neighbor search is highly important, since the computational |
| 11 | | complexity of these problems is reduced from O(N^2^) for a brute |
| 12 | | force method to O(N log N) for the tree method, where N is |
| 13 | | the number of particles. |
| 14 | | In this paper, we present a parallel |
| 15 | | implementation of the tree method running on a graphics processing |
| 16 | | unit (GPU). We present a detailed description of how we have |
| 17 | | implemented the tree method on a Cypress GPU. |
| 18 | | An optimization that we found important is localized particle ordering to |
| 19 | | effectively utilize cache memory. |
| 20 | | We present a number of test results |
| 21 | | and performance measurements. |
| 22 | | Our results show that the execution of the tree traversal |
| 23 | | in a force calculation on a GPU is practical and efficient. |
| | 7 | The kd-tree is a fundamental tool in computer science. Among other applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighbor search is highly important, since the computational complexity of these problems is reduced from O(N^2^) for a brute |
| | 8 | force method to O(N log N) for the tree method, where N is the number of particles. In this paper, we present a parallel implementation of the tree method running on a graphics processing unit (GPU). We present a detailed description of how we have implemented the tree method on a Cypress GPU. An optimization that we found important is localized particle ordering to effectively utilize cache memory. We present a number of test results and performance measurements. |
| | 9 | Our results show that the execution of the tree traversal in a force calculation on a GPU is practical and efficient. |