Tuesday, January 17, 2012

Simple Stupid Cuda Tutorial - Part 1

CUDA toolkit is used to write parallel programs using NVIDIA's GPU. Modern GPUs are not only for graphics,they are used for compute purpose as well. OpenCL is a standard for CPU/GPU co-processing and Microsoft has got their own DirectCompute for the same purpose

Following program assumes that you have installed NVIDIA's CUDA toolkit and it just uses a GPU to do a trivial task,one that of adding numbers. This program can be used as a starting point into writing Cuda programs.


//////////////////////////////


// CudaAdd.cu

//

// A Cuda Program to Add two numbers.... 

// The aim of this program is just to show

// how to use NVIDIA Cuda toolkit

//

//

// Written By

//

// Praseed Pai K.T.

//

// http://praseedp.blogspot.com

//

//

// if you have installed CUDA toolkit,executes

// at the Visual Studio Command Prompt...

//

// nvcc -o test.exe CudaAdd.cu

// test.exe

//





#include <stdio.h>



/////////////////////////////////////

//

// The following simple routine executes

// on a GPU 

//

//

//



__global__ void AddKernel(int a , int b, int *result )

{

*result = a + b;

}



/////////////////////////////////////

//

// main - CPU entry point ....

//

//



int main( int argc, char **argv )

{

int result; // to store CPU int...



int *dev_result; // to store the GPU int...



// --------- Allocate memory for the device result...

// --------- Let us assume that function won't fail...



cudaMalloc( (void**)&dev_result,sizeof(int));



//----------- invoke the Kernel Function...

//----------- __global__ gives a hint to nvcc that 

//----------- the stuff has to be compiled by GPU compiler



AddKernel<<<1,1>>>( 2, 7, dev_result ); 



//----------- Copy the dev_result from GPU memory to 

//----------- CPU memory 



cudaMemcpy( &result,

dev_result,

sizeof(int),

cudaMemcpyDeviceToHost ); 







printf( "%d + %d = %d\n",2,7,result );





cudaFree( dev_result );



return 0;





}

 

Here is the dump of my visual studio command prompt.....





I:\GRAPHICS_RES\CUDA_LEARN\FirstCuda>nvcc -o test.exe CudaAdd.cu

CudaAdd.cu

tmpxft_000009a0_00000000-3_CudaAdd.cudafe1.gpu

tmpxft_000009a0_00000000-8_CudaAdd.cudafe2.gpu

CudaAdd.cu

tmpxft_000009a0_00000000-3_CudaAdd.cudafe1.cpp

tmpxft_000009a0_00000000-14_CudaAdd.ii



I:\GRAPHICS_RES\CUDA_LEARN\FirstCuda>test

2 + 7 = 9



I:\GRAPHICS_RES\CUDA_LEARN\FirstCuda>

 

0 comments: