CS203 Winter 2020 - Assignment 3


Due: Friday, March 5 midnight (11:59:59 pm)

This assignment will be turned in through Github Classroom.

Objective

The goal of lab 3 is to design and implement a simple cache simulator, with support for (1) direct-mapped, set-associative, and fully-associative mapping, and (2) LRU cache replacement policy. If you have any questions, please post to Campuswire as others may be having the same issue.

For this assigment, we will be using Github Classroom. This assignment can be completed in teams of up to 2 students. You can accept the assignment invitation by clicking the following link: https://classroom.github.com/g/bEw1xCJM
Note that you will have to create a team in order to accept the assignment.

  • If you plan on doing the assignment as a group, one member will have to first create the team. Then the next member can join the team.
  • If you plan on doing the assignment individually, you will still have to create a team. Once you accept the assignment, a private github repository will automatically be created for your team with the starter code.

Memory Access Traces

To facilitate the testing of your cache simulator, we will utilize two sets of traces collected from a run of gcc.
The traces gcc-10K.memtrace, and gcc-650K.memtrace are available in the Git repository under the traces directory. The traces contain 10 thousand and 650 thousand entries, respectively. A sample of the trace is given below:

L -200 7fffe7ff088
L 0 7fffe7fefd0
S 8 12ff228
S 8 12ff208
L 0 a295e8

Each line in the trace file contains:

  • the memory instruction type (L = load, S = store),
  • the offset in decimal,
  • the memory address in hexadecimal.
    Note that this trace was obtained from an x86 machine, and thus the memory address is 44-bits. For the purpose of this class, we can assume 32-bits and you can truncate the most significant 12 bits. This has already been done for you in the starter code.

Cachesim

Cachesim should be able to simulate the hit / miss rate of a simple cache.
The command-line parameters and usage for cachesim are:
./cachesim -i <tracefile> -c <cache size> -b <block size> -w <number of ways>

  • -i : input trace file name
  • -c : cache size. Supports suffix of B, KB, MB, and GB.
  • -b : block size. Supports suffix of B, KB, MB, and GB.
  • -w : number of ways (n-way set associativity)

For simplicity, we can make the following assumptions:
a) Maximum of 4MB cache size
b) Assume a Least Recently Used (LRU) cache replacement policy. (Not Pseudo-LRU!)

A sample run of Cachesim is as follows:

$ ./cachesim -i traces/gcc-10K.memtrace -c 1MB -b 1KB -w 8
------------------------------
Cache size: 1048576 Bytes
Block Size: 1024 Bytes
Sets: 128
Ways: 8
Tag bits: 15
Index bits: 7
Offset bits: 10
------------------------------
Hit rate: 99.15%
Miss rate: 0.85%
Hits: 9915
total: 10000
------------------------------

Extending Cachesim

Currently, Cachesim only implements a direct mapped cache with 16 sets. Your goal for this assignment is to extend Cachesim in order to support configurable cache size, block size, and set associativity, along with support for LRU cache replacement.
LRU tip: Think of the LRU bits as an index. Can you model LRU without directly managing the bits?

Code walkthrough

The main simulation loop is similar to that of assignment 2 where we grab the next memory trace and access the cache.
main.cpp, L57-L62

    while(trace.getNextMemoryAccess(memType, offset, address)){
       hit = cache.accessCache(address, offset);
       if(debug){
         printf("%s %d(0x%s) %s!\n\n", memType ? "STORE" : "LOAD", offset, address.c_str(), hit ? "HIT" : "MISS");
       }
    }

The cache is currently modeled as an array of size 16. Note that the memory trace does not actually contain any data, so our simulator will not model the functionality of storing the actual data blocks.
Instead, the simulator will simply need to keep track of the tag bits in order to model the functionality of the cache. Therefore, this array will store the tag.
cache.h, L35

uint32_t cache[16]; // hard coded 16 entry cache

The cache is initialized in cache.cpp, L3-L28. Currently, some values are hard-coded to simulate the 16 set directly mapped cache and will need to be modified to correctly reflect a configurable cache:

    // Cache number of sets
    this->sets = 16;  // Currently hard coded

    // Address field bits
    this->offsetBits = log2(blockSize);
    this->indexBits = 4;  // Currently hard coded. 
    this->tagBits = ADDRESS_LENGTH - this->indexBits - this->offsetBits;

The main function which models the function of the cache is accessCache() in cache.cpp, L30-L50.

bool Cache::accessCache(string address, int addressOffset){

This function takes an address and address offset as input.
The address is then parsed and stored in an address structure which contains the tag, index, and offset:
cache.cpp, L32

address_t addr = this->parseAddress(address, addressOffset);

cache.h, L21-25

struct address_t{
    uint32_t tag;
    uint32_t index;
    uint32_t offset;
};

Once we parse the address we check for a cache hit by checking if the tag matches:
cache.cpp, L35

bool hit = (cache[addr.index] == addr.tag);

If we have a cache miss, then we have to replace the block with the new block we are accessing. We do this simply by storing the new tag in the cache.
cache.cpp, L39-42

    else // If miss, then load cache block
    {
        cache[addr.index] = addr.tag; 
    }

Submission

Submission of this assignment is done by pushing your completed code and report to the github repository.

  1. Edit the README.md file to include your team member's name.
  2. Name your report FirstName-LastName.pdf or FirstName-LastName.txt or FirstName-LastName.docx, etc.
    If you're working in a group of 2, please include both members name in the report.
  3. Answer the following questions by adding a report document in the repository.

Please be sure to also include your names in the report.

Questions

  1. Assuming a 512KB 4-way set associative cache with 16B block size, how many bits does the tag have?
    What is the total size, in bits, of the cache including tag bits and data blocks? (Assume 32-bit addressing).
  1. What is the cache miss rate of the given traces and cache configuration?
    Assume we have a 512KB cache and 16B block size.

    Trace Direct mapped 2-way 4-way Fully associative
    gcc-10K.memtrace
    gcc-650K.memtrace
  2. For the following configurations, how many bits are for tag, index, and offset fields?

    Configuration Tag bits Index bits Offset bits
    256KB cache, 16B block size, Direct mapped
    256KB cache, 16B block size, 2-way
    256KB cache, 16B block size, 4-way
    256KB cache, 16B block size, Fully associative
  3. Assuming a 256KB cache and 32B block size.
    How does varying the number of ways affect cache miss rate?
    Plot the number of ways (1,2,4,8,16) vs miss rate for the two traces.
    What do you observe and why?

  1. Assuming a 256KB 2-way set associative cache.
    How does varying the size of the block affect cache miss rate?
    Plot the block size (2B, 4B, 8B, 16B, 32B, 64B) vs miss rate for the two traces.
    What do you observe and why?
  1. Assuming a 2-way set associative cache and 32B block size.
    How does varying the size of the cache affect cache miss rate?
    Plot the cache size (64K, 128K, 256K, 512K, 1M, 2M) vs miss rate for the two traces.
    What do you observe and why?
  1. Measuring cold, capacity, and conflict misses. (See “Measuring/Classifying Misses” in slides). In this problem, we will identify the types of misses for an 8KB, 4-way set associative cache with block size of 32B. For the gcc-650K trace, provide a breakdown of the type of cache misses. You can provide the breakdown in terms of the miss rate. For example, if the 8KB, 4-way cache has a 20% miss rate, an infinite size cache have a 1% miss rate, and a fully associative cache have a 10% miss rate, then 1% is due to cold misses, 9% is due to capacity misses, and 10% is due to conflict misses.

Extra credit: For the given gcc traces, find the optimal cache configuration (cache size, block size, associativity) that can achieve the best miss rate while also minimizing the total cache size (data block + tag bits). Describe your approach in finding this optimal cache configuration and if the configuration differs for each trace, why that may be? Recall we have an assumption of 4MB cache size maximum. (For simplicity, this 4MB limit is with respect to the data block size and does not include the tag bits).