= OpenCL Notes = == Disable auto vectorization == In the section 6.7.2 (187 page) in the OpenCL Specification Version: 1.1 (Revision: 33), "__attribute__((vec_type_hint())" is described. This hint controls the autovectorizer in the compiler for OpenCL C. I tested this feature by dumping the assembly code for a kernel targeted for AVX instructions with Intel SDK (version 1.5). || || lines || || no hint || 1567 || || with the hint || 333 || This hint amazingly reduces the size of the generated kernel code. Still working... == How to use "ioc" command equipped with Intel SDK. == We need to set the environment variable INTELOCLSDKROOT {{{ export INTELOCLSDKROOT=/usr/lib64/OpenCL/vendors/intel }}} To dump the assembly code: {{{ ioc -input=kernel_file.cl -asm }}} == Dump IL/ISA with AMD SDK == Set the following the environment variable (APP Programming Guide August 2011, section 4.2 (63 page)). {{{ export GPU_DUMP_DEVICE_KERNEL=3 }}} == SDK and driver == === Latest SDK === AMD http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx Intel http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/ Nvidia Apple'SDK comes with MacOS X only === Latest Driver for AMD === http://support.amd.com/us/gpudownload/linux/Pages/radeon_linux.aspx?type=2.4.1&product=2.4.1.3.42&lang=English = Random notes = == icc == http://software.intel.com/en-us/articles/using-intel-compilers-for-linux-with-ubuntu/ == packages == Ubuntu 10.04.1 LTS {{{ sudo aptitude install libgsl0-dev gfortran libnetcdf-dev linux-headers-2.6.32-22-generic g++ libblas-dev rake emacs lv zsh ssh xdm ia32-libs linux-image-2.6.32-22-generic subversion git-core openmpi-bin openmpi-dev }}} 10.04 LTS Server {{{ sudo aptitude install xdm ia32-libs subversion zsh libgsl0-dev gfortran libnetcdf-dev g++ freeglut3-dev xserver-xorg rake emacs lv xterm }}} http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=147002 == process affinity == From command line: http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html http://www.open-mpi.org/projects/hwloc/ == DOUBLE == http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=92 = old info = == Standard Compute Layer Library == http://www.browndeertechnology.com/stdcl.html A wrapper library API for OpenCL API used in the tutorial below. It seems that the libstdcl greatly simplify a sample program ($OPENCLDIR/samples/opencl/cl/app/NBody) supplied with Stream SDK 2.0beta. == OpenCL Tutorial: N-Body Simulation == http://www.browndeertechnology.com/docs/BDT_OpenCL_Tutorial_NBody.html A tutorial that modifies the sample NBody program written in OpenCL API. == local & global == http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=123350&enterthread=y {{{ It's a tricky question . On 4xxx __local mem is really __global mem ( ATI thinks it's too much work to optimize compiler to use 48xx LDS - although it's possible ). On 5xxx __local is LDS - so it's located in simd core. }}} == Catalyst 10.1 with Ubuntu 9.10 workarounds == === Change the kernel boot option === Edit "/etc/default/grub" and execute update-grub. I add "nopat" to GRUB_CMDLINE_LINUX_DEFAULT for Catalyst 10.1. The configuration file for the grub is at /boot/grub/grub.cfg. This file is automatically generated by update-grub command. This trick is not necessary for Catalyst 10.2. === X server setting === The default login manager gdm is difficult to properly configure, I install xdm instead of gdm. The configuration file for xdm is at /etc/X11/xdm directory. Edit "Xservers" file as {{{ :0 local /usr/bin/X :0 vt7 -nolisten tcp -ac }}} This "-ac" option enable remote applications to access the local X server. Note this option is generally regarded as "bad" for security. Be careful.