Final Project


The objective of the final project is for you to get hands-on project experience with GPUs.
GPU projects can be at the level of:

  • Software / Algorithms (I expect most students will fall in this category).
  • Compiler / Runtimes
  • Architecture


The Final Project can be group-based with at most 3 students per group or individual-based.
Note that as the group size increases, so does my expectation for the project outcomes.


The Final Project is open-ended. Therefore, you are free to suggest any project you may be interested in. However, make sure that the project has a non-trivial GPU-oriented component to it.
For example, building ML models in PyTorch and enabling GPU support may have a non-trivial ML component, but there is no non-trivial GPU component.

This implementation-based project (programming / development) requires that there is substantial implementation effort. You do not have to establish novelty vs. related work and it is sufficient to evaluate your own implementation against the sequential one (for example).


  • Tuesday, November 9 - Form group and obtain project approval from me by posting a private group message on Piazza.
  • Thursday, November 18 - Mid-project progress meeting w/ progress report.
  • Friday, December 10 - Final project due. Report & code submission.

Milestone 1: Group Formation and Project Idea

Once you form your group, in Piazza, send me a private message with your group memebers and proposed topic. Once you form your group and inform me, I will make a group discussion for your group in Piazza to faciliate our communications. The goal is to have an approved topic by Tuesday, November 9. So please form your group and iterate your project idea with me as soon as you can.

Milestone 2: Progress report and meeting

  • November 18. Open lecture to discuss project details.
  • Normally, I would meet in person with every group. So many of these conversations may be conducted through Piazza.
  • Progress report with final project description, planned tasks, expected outcomes (what do you hope to deliver by the project deadline?), tools/libraries/languages needed, if working in teams how will tasks be divided.

    Submit progress report on eLearn. There will be an eLearn assignment posted for this.

Milestone 3: Final Submission

Final project submission due Friday, December 10. No extentions will be given.

Final project submission should be zipped and uploaded to eLearn. The zip file should include the following:

  • Report / Documentation
  • Source code

There is no requirement for report template. If you're feeling ambitious, you can always create your report in IEEE conference format using LaTeX.

Your report should describe:

  • Overview of project
  • Technical description of your implementation. Such as, what optimizations did you use? Which libraries were used? How was your algorithm designed/implemented? The technical description would vary based on your project goals.
  • Status of your project. Is it feature complete? Does everything work? What does not work? What are limitations of your project? (e.g. only works on square matrix) What major technical challenges did you encounter?
  • Evaluation / Results. Timing of parallel code vs serial code. Timing of various implementations. Bottleneck analysis using profiling. Screenshots of your project running. etc...
  • Documentation and outline of how to compile and run your project. Description of expected results when running your project. Which input should be used for testing?
  • If working in groups of 2 or more, include a table of tasks and breakdown of contribution, for example:
Task Breakdown
Implementation of Feature 1 Tommy Trojan - 100%, Joe Bruin - 0%
Implementation of Feature 2 Tommy Trojan - 99%, Joe Bruin - 1%
Project Report Tommy Trojan - 95%, Joe Bruin - 5%

Potential Project Ideas:

The goal is to ACCELERATE. Make something (anything!) faster!

  • Parallelize a serial application (Should include performance comparison between CPU and GPU implementation)
    • Scientific applications such as Molecular dynamics, Fluid dynamics, etc.
    • Hashing functions, Compression, etc.
    • Low-level image processing: Edge detection, segmentation, etc.
  • Use CUDA parallel libraries to create a parallel application
    • Use CuDNN to implement a CNN. Compare performance with CPU implementation.
    • Use CuBLAS, CuSparse to accelerate numerical applications.
    • Use NCCL to create a multi-GPU application. Distributed matrix-multiply, ML training, etc.
  • System-level GPU support
    • Add GPU backend support to existing cloud frameworks
    • Build a GPU inference server
    • Build a GPU microservice with RESTful API
    • Build a GPU function-as-a-service server. (Add GPU backend support to existing FaaS frameworks)
    • Accelerate database operations with GPUs
  • Performance / Bottleneck analysis
    • Performance comparison of various GPU programming APIs (CUDA, OpenCL, OpenACC, Thrust, etc.)
    • GPU microarchitecture bottleneck analysis of specific applications (Identify which components cause stalls, such as Register File, Cache, Memory, Execution unit contention, etc. )
  • Optimize at the algorithm-level
    • Novel algorithm or data representation for efficient processing of Spare matrix multiplication, graphs, and other irregular data sets.

Some more examples for Winter 2019 CS/EE 217

  • Python CUDA CNN
  • Real-time Streaming, RNN/LSTM
  • Network streaming
  • TensorRT
  • Image processing pipeline
  • Edge detection on various GPU APIs
  • Multi-GPU Wavefront
  • NN API
  • Parallel gzip
  • NN Backprop
  • Image segmentation w/ k-means clustering
  • Parallel LZSS compression
  • K-NN
  • Fluid simulation
  • GPU microarch bottleneck analysis
  • API performance comparison
  • Parallel LCS
  • Time-series hashing
  • CNN
  • Ray Tracing
  • Linear regression
  • Matrix Profile
  • Spatial data histogram
  • GPU Network router
  • Kernel scheduling
  • Fuzzy logic edge detection
  • Theano GPUarray CNN

Projects NOT Allowed

  • MRI-Q
  • Simple Neural Network implementation
  • Simple image processing (Edge detection, Gaussian blur, etc.)
  • K-means