wiki:UGT2011

Y.Suzuki

Fast N-Body Calculation Implemented by OpenCL with Vectorization

I compared the performance of N-body simulations on CPU and GPU with a several optimization techniques. Each program is written in OpenCL which standardizes APIs for GPU, and an important optimization technique in OpenCL is a vectorization. It enables us to utilize multiple variables as one variable. As a result, the program which utilized 4 variables as one was the best performance. I optimized the program using shuffle function. I found the calculation of N-body problem using shuffle function was about 1.3 times faster than without it. I also found Intel SDK had an ability to efficiently vectorize the kernel program.

file:///home/committee/aac/Thesis2011/s1140123

K.Kamijima

Performance Evaluation of the Octree Method on GPU

The purpose of my research is to examine the performance of graphic processing unit (GPU) with a numerical algorithm for particle simulations. Specifically, I adopt the Octree method that requires many branch instructions. In general, GPU is intrinsically not good at dealing with branch instructions. I have implemented the Octree method in single and double precision on Cypress GPU. With the GPU, the peak performance of single precision operations is five times better than that of double precision. Although it was expected that the performance of the Octree method with double precision is much slower, I found that it is not the case. On GPU, I found that the performance of the Octree method is constrained by not the computing power of the GPU but the performance penalty due to branch instructions.

file:///home/committee/aac/Thesis2011/s1150062

K.Seiwa

GPU Acceleration of Numerical Simulation of Fluid by the Lattice Boltzmann Method

Numerical simulations of fluid are used for analyzing

the gas and liquid motion of air or water. It has been developed drastically by the performance enhancement of the computers. It is regarded as important in the designing vehicle such as a car, an air plane, and a ship moving in the fluid.The lattice Boltzmann method (LBM) is the one of the method for numerically simulating the thin fluid and it simulates the fluid field motion expressed by lattice and particles placed on the lattice points. When we need to analyze more strictly, we need many lattice and particles and this method will take longer time.To simulate the fluid field effectively, I have implemented the LBM on Graphic Processing Unit (GPU). GPU is a processor turned for data parallel computation with a large number of computing cores. I try to accelerate the fluid simulation using OpenCL which is a framework for parallel programming. As the result, my numerical simulation by the LBM becomes faster about 5 times than on CPU. We conclude that using GPU is effective to accelerate the LBM simulations.

file:///home/committee/aac/Thesis2011/s1150132

T.Suzuki

OpenCL Implementation of Exact String Matching

Graphics Processing Units (GPUs) have evolved over the past few years from dedicated graphics rendering devices to powerful parallel processors and they are outperforming traditional Central Processing Units (CPUs) in many areas of scientific computing. This paper presents experimental results on the parallel processing for some well known on-line string matching algorithms using OpenCL that is a standard API for writing parallel programs for CPU and GPU. I found that the simplest algorithm with help of vectorization is the fastest on GPU. The performance of my optimized string matching kernel on GPU is 10 times faster than the standard utility command “grep” for simultaneously matching enough large number of strings.

file:///home/committee/aac/Thesis2011/s1160119

K.Nakamura

Acceleration of Matrix Multiplication in Double-Double Precision by OpenCL

In this research, I used the double-double (DD) precision data type for calculations. DD precision here is pseudo DD precision and implemented by not a hardware but a software as programming. This DD precision value uses two double precision values to stand for it. An operation in DD precision requires dozen double precision operations. So it takes much time to calculate, we need to accelerate the DD calculations. The purpose of my research is how fast I can accelerate matrix multiplication in DD precision. For the acceleration, I used the technique of General Purpose GPU (GPGPU). GPGPU is a method of parallel processing by using GPUs. But in this time I can also use multi-core CPUs for acceleration by OpenCL. By OpenCL, matrix multiplication was about 480 times as fast as that of non-parallel processing on GPU and about 28 times faster on multi-core CPU. And I tested LU factorization as an application of accelerated matrix multiplication. In the result about 12 times faster on GPU and about 9 times faster on multi-core CPU.

file:///home/committee/aac/Thesis2011/s1160154

T.Watanabe

Fluid Simulations in Curved Pipes using Smoothed Particle Hydrodynamics on GPU

For simulating incompressible fluid such as water with Smoothed Particle Hydrodynamics (SPH), we need a lot of time for calculating interactions between particles. Especially, we need a lot of additional particles for wall boundaries. In this paper, we introduce a new approach for setting wall boundaries without particles for curved pipes, and we present the results of the incompressible fluid flow in the curved wall boundaries using SPH. In my proposed approach for wall boundaries, we calculate the force between the boundary of the pipe and particles using the distance between a particle and the centerline of the pipe. We implement the simulation on GPU using OpenCL. At a high number of particles, calculation speed on GPU is more than 10 times faster than CPU on single core in the simulation.

file:///home/committee/aac/Thesis2011/s1170174

Last modified 10 years ago Last modified on Nov 5, 2014 9:37:34 AM