Assignment 1 - CUDA and Bender Setup

DUE Monday, April 11 @ 11:59:59PM Pacific Time


The objective of this assignment is to get setup with using CUDA on the Bender ENGR server. The following tutorial provides detail steps and directions: BenderTutorial.pdf

If you are unfamiliar with using the Linux commandline, there's a great tutorial here:

  1. For this lab, we will be using GitHub Classroom.
    Please join the classroom by clicking the following link:
    Once you join the classroom, a private GitHub repository will automatically be created with the starter code.
    Simply git clone to copy the starter code to Bender.

Installing CUDA on your own Windows machine

If you're interested in using your Nvidia GPU on your Windows system, you can also run CUDA on Windows Subsystem for Linux (WSL) by looking at this user guide from Nvidia:

Installing CUDA on your own Linux machine

If you're interested in using your Nvidia GPU on your Linux system, there are many different ways to install CUDA. This user guide from Nvidia summarizes this: This guide summarizes how to install CUDA using the Linux package manager (apt-get, yum, etc.), Anaconda, Python wheels (pip), or manually with a run script. You should only need to do one of these methods.

Installing CUDA on your own Mac machine

CUDA is no longer supported on macOS as of CUDA 10.2.

Vector Add

Once you have access to Bender and confirmed it can compile and run CUDA code, we will implement a simple Vector Add.

  1. Clone the git repository. There should be 5 files:,, Makefile,, support.h
  2. By default, vector add operates on 10000 elements, which are randomly generated. To ensure consistency when grading, do not change the srad seed value in line 9.
  3. Complete the vector add application by adding your code to and Use a thread block size of 512. You may consult the class slides on vector add.

Answer the following questions:

  1. How many total thread blocks do we use?

10000/512 = 20 Thread Blocks

  1. Are all thread blocks full? That is, do all threads in the thread block have data to operate on?

No. The last thread block is not full.

  1. How can this basic vector add program be improved? (What changes do you think can be made to speed up the code?)

This is an open-ended question. There are numerous correct answers. One answer will be presented in the CUDA Streams lecture. =)


  1. Answer the previous questions by adding a report document in the repository. Please name your report FirstName-LastName.pdf or FirstName-LastName.txt or FirstName-LastName.docx, etc.
  2. Commit and push your completed Vector Add to the github repository. (You only need to modify and

    Please also include your name in the report.