Version 41 (modified by nakasato, 13 years ago) (diff) |
---|
OpenCL Notes
Disable auto vectorization
In the section 6.7.2 (187 page) in the OpenCL Specification Version: 1.1 (Revision: 33), "attribute((vec_type_hint(<typen>))" is described. This hint controls the autovectorizer in the compiler for OpenCL C. I tested this feature by dumping the assembly code for a kernel (using "float8" ) targeted for AVX instructions with Intel SDK (version 1.5).
attribute | lines | comment |
no hint | 1567 | vectorized (inner-loop is further unrolled) |
with the hint | 333 | simple translation of the input kernel |
This hint amazingly reduces the size of the generated assembly code. Without this attribute, the generated code includes two functions: (1) the code without unrolling and (2) the code over unrolled. Due to this, the generated assembly file is large but we don't know which code is really used.
How to use "ioc" command equipped with Intel SDK.
We need to set the environment variable INTELOCLSDKROOT
export INTELOCLSDKROOT=/usr/lib64/OpenCL/vendors/intel
To dump the assembly code:
ioc -input=kernel_file.cl -asm
Dump IL/ISA with AMD SDK
Set the following the environment variable (APP Programming Guide August 2011, section 4.2 (63 page)).
export GPU_DUMP_DEVICE_KERNEL=3
SDK and driver
Latest SDK
AMD http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx
Intel http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/
Nvidia
Apple'SDK comes with MacOS X only
Latest Driver for AMD
Random notes
icc
http://software.intel.com/en-us/articles/using-intel-compilers-for-linux-with-ubuntu/
packages
Ubuntu 10.04.1 LTS
sudo aptitude install libgsl0-dev gfortran libnetcdf-dev linux-headers-2.6.32-22-generic g++ libblas-dev rake emacs lv zsh ssh xdm ia32-libs linux-image-2.6.32-22-generic subversion git-core openmpi-bin openmpi-dev
10.04 LTS Server
sudo aptitude install xdm ia32-libs subversion zsh libgsl0-dev gfortran libnetcdf-dev g++ freeglut3-dev xserver-xorg rake emacs lv xterm
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=147002
process affinity
From command line: http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html
http://www.open-mpi.org/projects/hwloc/
DOUBLE
http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=92
old info
Standard Compute Layer Library
http://www.browndeertechnology.com/stdcl.html
A wrapper library API for OpenCL API used in the tutorial below. It seems that the libstdcl greatly simplify a sample program ($OPENCLDIR/samples/opencl/cl/app/NBody) supplied with Stream SDK 2.0beta.
OpenCL Tutorial: N-Body Simulation
http://www.browndeertechnology.com/docs/BDT_OpenCL_Tutorial_NBody.html
A tutorial that modifies the sample NBody program written in OpenCL API.
local & global
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=123350&enterthread=y
It's a tricky question . On 4xxx __local mem is really __global mem ( ATI thinks it's too much work to optimize compiler to use 48xx LDS - although it's possible ). On 5xxx __local is LDS - so it's located in simd core.
Catalyst 10.1 with Ubuntu 9.10 workarounds
Change the kernel boot option
Edit "/etc/default/grub" and execute update-grub. I add "nopat" to GRUB_CMDLINE_LINUX_DEFAULT for Catalyst 10.1.
The configuration file for the grub is at /boot/grub/grub.cfg. This file is automatically generated by update-grub command.
This trick is not necessary for Catalyst 10.2.
X server setting
The default login manager gdm is difficult to properly configure, I install xdm instead of gdm. The configuration file for xdm is at /etc/X11/xdm directory.
Edit "Xservers" file as
:0 local /usr/bin/X :0 vt7 -nolisten tcp -ac
This "-ac" option enable remote applications to access the local X server. Note this option is generally regarded as "bad" for security. Be careful.
Attachments (3)
- NI.png (27.1 KB) - added by nakasato 13 years ago.
- SI.png (41.6 KB) - added by nakasato 13 years ago.
- GPU_REGFILE.png (44.7 KB) - added by nakasato 13 years ago.
Download all attachments as: .zip