Final Project

Final report and code submission deadline: December 13 @ 11:59:59 PM


The objective of the final project is for you to get hands-on project experience with more advanced GPU topics and optimization techniques.


The Final Project can be group-based with at most 3 students per group or individual-based.
Note that as the group size increases, so does my expectation for the project outcomes. For example, grading can be partially based on amount of optimizations used, weighted by group size.


  • Team of 1-3 students
  • Final project report detailing implementation and results
  • Pre-recorded presentation / demo video.
  • Source code implementation uploaded to Github Classroom repo
  • Choice of 3 focused project options below.

Project Options

Option 1: Parallelizing serial C code with CUDA

This project option explores utilizing all of your CUDA knowledge to parallelize and optimize an existing serial C code base. For this option, you have the choice to parallelize:
Backpropagation Network (Source: )
Navier-Stokes 2D (Source: ) The goal is for you to speedup these programs as much as you can using the knowledge and techniques you learned in class.

Option 2: Open-ended project using CUDA libraries and other GPU languages

This project option allows you the flexibility to build an open-ended GPU-accelerated application of your choice. A central requirement of this project option is that your project must utilize GPUs through existing CUDA libraries or alternative GPU langauges. Examples of CUDA libraries are cuBLAS, cuDNN, cuFFT, etc. For examples of CUDA libraries, take a look at: and Note that you are not limited to Nvidia-provided GPU libraries, but other libraries that use GPUs are allowable. Examples of alternative GPU languages include OpenMP, Numba (for Python) or standard parallelism in C++17 or C++20.

Note that this project option must directly utilize a CUDA library to use GPUs or an alternative GPU language. Using higher-level frameworks that uses GPUs (such as PyTorch, Matlab, etc.) are not allowed and not in the scope of this project option.

For this project option, please consult with the TAs or Prof. Wong to iterate on a project idea and obtain approval.

Option 3: Open-ended GPU research project

This project option allows you to select an original research-oriented project that utilizes GPUs (microarchitecture, compiler, runtime, application, etc.) The research project option has the expectation that your project proposal/topic may lead to a published paper. As a result, there is an expectation of a related work that explores papers related to your idea. There is also an expectation of a more detailed evaluation that perhaps compares against the most relevant alternative to your idea.

For this project option, please consult with the TAs or Prof. Wong to iterate on a research proect idea and obtain approval.

Final Submission

Final project submission due Wednesday, December 13th. No extentions will be given.

Final project submission should be uploaded to Github Classrooms (link TBD). The repo should contain:

  • Final report
  • Source code
  • Documentation on how to run your code

There is no requirement for report template. If you're feeling ambitious, you can always create your report in IEEE conference format using LaTeX.

There is no page minimum for the final project report. Therefore, the length should be long enough to sufficiently cover the information requested, but not too long that you require rambling filler sentences.

In general, the project report closely follows the presentation requirement, but with more documentation and complete implementation/results. The report should at a minimum include the following:

  • Title page with project name and team members
  • Project Idea / Overview
  • How is the GPU used to accelerate the application? Examples of this includes:
    • Details related to the parallel algorithm design / implementation.
    • How is the problem space partitioned into threads / thread blocks
    • Which stage of the software pipeline is parallelized
    • What libraries were used?
    • The technical description would vary based on your project goals.
  • Implementation details
  • Documentation on how to run your code
    • Description of expected results when running your project. Which input should be used for testing?
  • Evaluation/Results
    • For example, Timing of CUDA code vs C code. Timing of various implementations. Bottleneck analysis using profiling. Screenshots of your project outputs. etc...
  • Status of your project.
    • Is it feature complete? Does everything work? What does not work?
    • What are limitations of your project? (e.g. only works on square matrix)
    • What major technical challenges did you encounter?
  • On the last page, include a table with a list of tasks for the project, and a percentage breakdown of contribution of each team member for the tasks. You can choose the granularity of task breakdown here.
Task Breakdown
Implementation of Feature 1 Tommy Trojan - 100%, Joe Bruin - 0%
Implementation of Feature 2 Tommy Trojan - 99%, Joe Bruin - 1%
Project Report Tommy Trojan - 95%, Joe Bruin - 5%


Due: Wednesday, December 13, 2023
Students will submit a pre-recorded presentation The video should be ~10 minutes in length. It's not a hard requirement, so don't feel the need to make the video longer if you don't need to.
The main points to cover in the presentation are:

  • Discuss your high-level project idea.
  • How you are implementing / How you are using the GPU?
  • Progress / Results / Lack thereof...
  • Short demo of your project
    • Demo must show how to download, compile, and run your code

You can either (1) upload this on Youtube and include a link in your report or repo, or you can (2) upload this to YuJa and share the video with us, making a note of this in your report or repo.