Sunday, November 11, 2018

Least Recently Used(LRU) algorithm implementation in Hardware


LRU algorithm is one of the most commonly used cache page replacement algorithms. It’s often easy to imagine the implementation of LRU using linked list structure in software but hardware implementation could be tricky and requires some thoughtful analysis. Here I have made an attempt to put together steps of matrix method of LRU implementation in Hardware which could eventually be translated to code in Verilog or VHDL

Matrix implementation requires n*n matrix for n location cache i.e 4*4 memory(16 Flipflops) for 4 location cache. The algorithm should track all the locations as it gets updated and should output the least recently used location (or index of cache).
Let’s start with simple 4 location cache and then generalize the solution for n location cache.

4 location cache with each location of 8 bits:
4 * 4 matrix to store the history and help calculate the least recently used location in cache
Total of 16 Flipflops for 4 location cache 


We would save the information of relative age of each location with respect to other
Say    A > B   =>  A is older than B
          A > C   => A is older than C

Let’s represent this in the matrix as below:

Diagonal is always 1 as it tries to compare same elements like A vs A, B vs B etc
Also it’s easy to see that A is older than B  =  ! (B is older than A)
(A > B) =  !(B>A)
With the above logic, matrix would look like below:

It’s easy to see that left half of matrix across diagonal is exactly opposite to right half.


Right half of the matrix would entirely be enough to get the age relation between cache locations


Steps:
1) Initialize the matrix to any known value. Say D > C > B > A
     i.e, D is the oldest and A is newest location
Note that diagonal value is always one

2) If location I is accessed then row I is set to zero and column I is set to one except for diagonal. 
     Update the matrix every time cache is accessed

3) Oldest location will be the location for which 'AND' of each column is equal to 1
     i.e,  &[columns of location]==1

It’s turns out to be true to D as all of its columns has 1


Let’s take an example and see if this works
Say order of access of cache is C,B,D,A รจ oldest location at the end of this should be C

a) Initialization to D > C > B > A


    b) C is accessed  => C is the most recent. Change all of c rows to zeros and columns to one

All of D columns are one and D should be the oldest at this point of time.
From above matrix, we can also derive that
 A > B = 0 , A > C =1, A > D = 0 , B > C =1 , B > D = 0 , C > D = 0 
   => C < A < B < D

    c) B is accessed. i.e, C followed by B. Change all B rows to zeros and B columns to ones
   New matrix would be as below:

D is the oldest as all of it’s columns are 1

    d) D is accessed. C followed by B followed by D. Change all B rows to zero and B columns to one
  New matrix:

A is the oldest and least recently used

     e) A is accessed. i.e, C followed by B followed by D followed by A
C column is all one and hence the oldest as expected