Assignment 4 - Histogram

Due Friday, May 27 at 11:59:59 PM PST

Overview

The objective of this assignment is to implement a histogram kernel and demonstrate the usage of atomic operations.

Histogram

  1. For this assignment, we will be using Github Classroom.
    Please join the classroom by clicking the following link: https://classroom.github.com/a/Xzuheaz0 Once you join the classroom, a private github repository will automatically be created with the starter code.
    Simply git clone to copy the starter code to Bender.

  2. Edit the source files kernel.cu to complete the functionality of histogram on the device. The size of input and number of bins can be any size, but we will not test your code with bin size larger than 4096 or size of input greater than 1M.
    Please use a block size of 512 and 16 blocks.

  3. There are three modes of operation for the application. Check main() for a description of the modes (repeated below). You will support each of these modes using Histogram with a privitization pattern.

    • No arguments: The application will create a randomly initialized vector of 1 million elements and 4096 bins. After the device histogram is invoked, it will compute the correct solution matrix using the CPU, and compare that solution with the device-computed solution. If it matches (within a certain tolerance), if will print out "Test PASSED" to the screen before exiting.
    • One argument: The application will create a randomly initialized vector of specificed size. The number of bins will be 4096.
    • Two arguments: The first argument will specify the data input size, and the second argument specify the number of bins.
  4. Commit and push your completed histogram code to the private repository.

Answer the following questions:

Assume you perform a histogram operation on an array of size N, 16 blocks, block size of B and 1024 bins. Assume a privatized histogram implementation.

  1. How many global memory reads are performed during the histogram kernel?

    N + 16 x 1024.
    Each element in the array will be read (N). Each block will perform an atomic operation for each bin in the private histogram (16 blocks x 1024 bins).

  2. How many global memory writes are performed during the histogram kernel?

    16 x 1024 = 16,384. A global memory write occur on every atomic operation to the global histogram. There are 16 blocks and each block will perform 1024 atomic operations for each bin in the histogram.

  3. How many total atomic operations are performed?

    N + 16 x 1024.
    There will be N atomic operation performed to shared memory for the private histogram. Then each block will perform an atomic operation to global memory for each bin in the private histogram to the global histogram. (16 blocks x 1024 bins). In total, there will be N + 16 x 1024 atomic operations performed.

Submission

  1. Commit and push your completed histogram code to the github repository. (You only need to modify kernel.cu.)
  2. Answer the previous questions by including a report document in the repository in either PDF, Plain text, or Markdown format only. Please name your report FirstName-LastName.pdf or FirstName-LastName.txt or FirstName-LastName.md, etc.

    Please also include your name in the report.