Final Project


Team Formation: Wednesday April 27, 2022
Project Presentation: Week 10
Project Report, Demo video and Code Due: Monday, June 6, 2022


  • Team of 1 - 3 students
  • Final project report detailing implementation and results
  • Final ~10 minute presentation
  • Open-ended project topics. "Make something faster!" or "Make something more efficient on GPU"

Project and Team Formation

Due Wednesday, April 27

Once you form a team, make a post in Piazza under the ''project'' folder with the following information:

  • Team members:
  • Team name:
  • Proposed project idea:
  • Required libraries or framework?
  • Any potential risks or technical problems/challenges?
  • What is your plan/outline of your project?

Project Deliverables

Project report

Project Report and Code Due: Monday, June 6, 2022
For the Final Project report, you can use any template you like.

There is no page minimum for the final project report. Therefore, the length should be long enough to sufficiently cover the information requested, but not too long that you require rambling filler sentences.

In general, the project report closely follows the presentation requirement, but with more documentation and complete implementation/results. The report should at a minimum include the following:

  • Title page with project name and team members
  • Project Idea / Overview
  • How is the GPU used to accelerate the application?
    • Examples of this includes:
    • Details related to the parallel algorithm
    • How is the problem space partitioned into threads / thread blocks
    • Which stage of the software pipeline is parallelized
  • Implementation details
  • Documentation on how to run your code
  • Evaluation/Results
  • Problems faced
  • On the last page, include a table with a list of tasks for the project, and a percentage breakdown of contribution of each team member for the tasks. You can choose the granularity of task breakdown here.
Task Breakdown
Implementation of Feature 1 Tommy Trojan - 100%, Joe Bruin - 0%
Implementation of Feature 2 Tommy Trojan - 99%, Joe Bruin - 1%
Project Report Tommy Trojan - 95%, Joe Bruin - 5%

Once you have formed your group join the following Github Classroom assignment and create a repo for your group: The repository that will be created is empty since this is an open-ended project. You will use this repository to submit your final project with the deliverables below. Therefore, the report and source code should be pushed to your project's git repo.


Due: Monday, June 6, 2022
Due to the large number of groups, students will submit a pre-recorded presentation instead of a live presentation during classtime The video should be ~10 minutes in length. It's not a hard requirement, so don't feel the need to make the video longer if you don't need to.
The main points to cover in the presentation are:

  • Discuss your high-level project idea.
  • How you are implementing / How you are using the GPU?
  • Progress / Results / Lack thereof...
  • Short demo?

You can either (1) upload this on Youtube and include a link in your report or repo, or you can (2) upload this to YuJa and share the video with us, making a note of this in your report or repo.

Project Ideas

  • Use CuDNN to implement a CNN, such as LeNet or AlexNet. Train it and demonstrate inference. Potentially compare against a CPU implementation to measure speedup.
  • Build a simple back propagation NN from scratch, without using any libraries. Apply various optimizations you learned in class, such as tiling, privatization, etc.
  • Implement a molecular dynamic application. Visualize it in real-time with OpenGL or other graphics framework.
  • Speedup signal processing applications using CuFFT.
  • Parallelize the hashing functions of cryptocurrencies in CUDA
  • Compare performance of various parallel algorithms using CUDA, OpenCL, OpenACC, etc. Explore performance and productivity tradeoffs of various low-level GPU programming languages.

Example projects from prior years:

  • CUDA acceleration in Matlab
  • Implement LeNet CNN in cuDNN, Keras, and Tensorflow. Perform performance comparison between implementations
  • Particle Simulation using CUDA and OpenGL
  • Using the Jetson board for computer vision
  • Algorithm comparison with different GPU langauges, such as Numba, OpenACC, Modern C++.