Version 39 (modified by nakasato, 13 years ago) (diff) |
---|
OpenCL Notes
Disable auto vectorization
In the section 6.7.2 (187 page) in the OpenCL Specification Version: 1.1 (Revision: 33), "attribute((vec_type_hint(<typen>))" is described. This hint controls the autovectorizer in the compiler for OpenCL C. I tested this feature by dumping the assembly code for a kernel targeted for AVX instructions with Intel SDK (version 1.5).
lines | performance | |
no hint | 1567 | 23.1 sec |
with the hint | 333 | 43.0 sec |
This hint amazingly reduces the size of the generated kernel code. But... OK, a shorter code is not always a faster code. Still working...
How to use "ioc" command equipped with Intel SDK.
We need to set the environment variable INTELOCLSDKROOT
export INTELOCLSDKROOT=/usr/lib64/OpenCL/vendors/intel
To dump the assembly code:
ioc -input=kernel_file.cl -asm
Dump IL/ISA with AMD SDK
Set the following the environment variable (APP Programming Guide August 2011, section 4.2 (63 page)).
export GPU_DUMP_DEVICE_KERNEL=3
SDK and driver
Latest SDK
AMD http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx
Intel http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/
Nvidia
Apple'SDK comes with MacOS X only
Latest Driver for AMD
Random notes
icc
http://software.intel.com/en-us/articles/using-intel-compilers-for-linux-with-ubuntu/
packages
Ubuntu 10.04.1 LTS
sudo aptitude install libgsl0-dev gfortran libnetcdf-dev linux-headers-2.6.32-22-generic g++ libblas-dev rake emacs lv zsh ssh xdm ia32-libs linux-image-2.6.32-22-generic subversion git-core openmpi-bin openmpi-dev
10.04 LTS Server
sudo aptitude install xdm ia32-libs subversion zsh libgsl0-dev gfortran libnetcdf-dev g++ freeglut3-dev xserver-xorg rake emacs lv xterm
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=147002
process affinity
From command line: http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html
http://www.open-mpi.org/projects/hwloc/
DOUBLE
http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=92
old info
Standard Compute Layer Library
http://www.browndeertechnology.com/stdcl.html
A wrapper library API for OpenCL API used in the tutorial below. It seems that the libstdcl greatly simplify a sample program ($OPENCLDIR/samples/opencl/cl/app/NBody) supplied with Stream SDK 2.0beta.
OpenCL Tutorial: N-Body Simulation
http://www.browndeertechnology.com/docs/BDT_OpenCL_Tutorial_NBody.html
A tutorial that modifies the sample NBody program written in OpenCL API.
local & global
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=123350&enterthread=y
It's a tricky question . On 4xxx __local mem is really __global mem ( ATI thinks it's too much work to optimize compiler to use 48xx LDS - although it's possible ). On 5xxx __local is LDS - so it's located in simd core.
Catalyst 10.1 with Ubuntu 9.10 workarounds
Change the kernel boot option
Edit "/etc/default/grub" and execute update-grub. I add "nopat" to GRUB_CMDLINE_LINUX_DEFAULT for Catalyst 10.1.
The configuration file for the grub is at /boot/grub/grub.cfg. This file is automatically generated by update-grub command.
This trick is not necessary for Catalyst 10.2.
X server setting
The default login manager gdm is difficult to properly configure, I install xdm instead of gdm. The configuration file for xdm is at /etc/X11/xdm directory.
Edit "Xservers" file as
:0 local /usr/bin/X :0 vt7 -nolisten tcp -ac
This "-ac" option enable remote applications to access the local X server. Note this option is generally regarded as "bad" for security. Be careful.
Attachments (3)
- NI.png (27.1 KB) - added by nakasato 13 years ago.
- SI.png (41.6 KB) - added by nakasato 13 years ago.
- GPU_REGFILE.png (44.7 KB) - added by nakasato 13 years ago.
Download all attachments as: .zip