Project

General

Profile

CUDA Tutorial

Sunday Nov. 15 2009

Introduction

The idea of CUDA is to have as many threads (not the traditional threads) doing very small tasks.

Threads

  • Block: 512 threads.
  • Block Barrier: threads in a block synchronize at same point.
  • Warp: 32 threads, one instruction is executed by one warp.

Use of conditionals (if-then-else) in warps reduce throughput if some threads take one path a others take another.

Memory

  • Threads in a block have high-speed shared memory available.
  • All threads from any block can access global memory.

Programming

  • Mix of serial and parallel code.
  • Parallel sections of code can be offloaded to CUDA.
  • Available in C++, Java, Python and possibly other languages.
  • Debugging: cuda_gdb (question: what is a breakpoint when you have 10K threads?)

C++ STL

  • Parallel algorithms with STL interface provided by Thrust