Setting up CUDA and Visual Studio 2010
I've never used Visual Studio before 2010, but I hear Microsoft has given it a complete facelift. A whole bunch of customizations in modules, custom build goodness, etc. Since I tried and use it together with CUDA 4, the information about setting it up is scattered all over the place, and I thought I'd explain the steps necessary to set it up to help other first-timers.
Here's what's involved here:
- Visual Studio 2010 on Windows 7 x64
- NVIDIA CUDA Toolkit 4
- Windows SDK 7.1
- NVIDIA GPU Computing SDK 4
- NVIDIA Parallel Nsight 2.0
First off, I doubt the order of installation makes any difference, but for reference it was: VS 2010, CUDA toolkit, Windows SDK 7.1 (Visual Studio installs the SDK but it's only a tiny bit "outdated"). Now, you may not need the Windows SDK, but I installed it prior to working with CUDA in VS, so I'm not at all sure of this. However, it's all full of goodies, so why the heck not.
After all the installations are done, enter VS 2010 and
Step #1 - Create an empty project
Create a new Visual C++ Empty Project, give it a name, click OK. This step isn't so hard, is it? :-)
Step #2 - Specify build customizations
In Solution Explorer, right click your project and choose 'Build Customizations'. If all is well, you should see a screen that kind of looks like this:
Select the CUDA 4.0 customization files and click OK.Step #3 - Hello, VS 2010 World!
We're ready to write some code! Right click Source Files under Solution Explorer and add a new item. Make it a C++ file but give it the .cu suffix.
#include <cuda.h>
#include <stdio.h>
__global__ void helloWorld(char* str)
{
// determine where in the thread grid we are
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// unmangle output
str[idx] += idx;
}
int main(int argc, char** argv)
{
int i;
char str[] = "Hello World!";
for(i = 0; i < 12; i++)
str[i] -= i;
// allocate memory on the device
char *d_str;
size_t size = sizeof(str);
cudaMalloc((void**)&d_str, size);
cudaMemcpy(d_str, str, size, cudaMemcpyHostToDevice);
dim3 dimGrid(2); // one block per word
dim3 dimBlock(6); // one thread per character
helloWorld<<< dimGrid, dimBlock >>>(d_str);
cudaMemcpy(str, d_str, size, cudaMemcpyDeviceToHost);
cudaFree(d_str);
printf("%s\n", str);
return 0;
}
Our "Hello, World!" program for the evening adds up its "rank" (if you're familiar with MPI terms) to its assigned character in the string, and then prints it. Nothing too fancy. Save it.
Step #4 - Building and linking
This is the only real trick to the whole process. The CUDA toolkit + Windows SDK installations must have already set your paths and environment correctly. The only thing missing from the story is the cudart.lib library - if you try building at this point, VS will complain about _cudaFree, _cudaMalloc, etc. and not being able to resolve them. The cudart.lib library is in $(CUDA_LIB_PATH), and here's how we tell the VS linker: In Solution Explorer, right click your project, select Properties, then Linker, then General. Verify that the field Additional Library Directories contains either $(CUDA_LIB_PATH) or $(CudaToolkitLibDir); if it does, you're golden, if not add them yourself. Then proceed to the Input tab, and add cudart.lib in the Additional Dependencies field.
Step #5 - Go forth and compute!
You are all set. Execute your "Hello, World!" program and have fun working with CUDA 4!
I will update this post with details about the CUDA SDK and Nsight in the coming days.
