離線快取

Created: November-22, 2018

快取問題源於有限空間的限制。讓我們假設我們的快取 C 有 k 頁面。現在我們要處理一系列 m 項請求，這些請求必須在處理之前放入快取中。當然，如果 m<=k 然後我們只是將所有元素放在快取中它會起作用，但通常是 m>>k。

當專案已經在快取中時，我們說請求是快取命中，否則稱為快取未命中。在這種情況下，我們必須將請求的專案帶入快取並逐出另一個，假設快取已滿。目標是一種驅逐計劃，可以最大限度地減少驅逐次數。

這個問題有很多貪婪的策略，讓我們來看看：

先進先出（FIFO） ：最舊的頁面被逐出
最後，先出（LIFO） ：最新頁面被逐出
最近出版（LRU） ：最近訪問最早的 Evict 頁面
最不經常請求（LFU） ： 最不常請求的 Evict 頁面
最長前向距離（LFD） ：快取中的 Evict 頁面，直到將來最遠才會請求。

注意： 對於以下示例，如果可以驅逐多個頁面，我們將逐出具有最小索引的頁面。

示例（FIFO）

讓快取大小為 k=3 初始快取 a,b,c 和請求 a,a,d,e,b,b,a,c,f,d,e,a,f,b,e,c：

請求	一個	一個	d	Ë	b	b	一個	C	F	d	Ë	一個	F	b	Ë	C
`cache 1`	一個	一個	d	d	d	d	一個	一個	一個	d	d	d	F	F	F	C
`cache 2`	b	b	b	Ë	Ë	Ë	Ë	C	C	C	Ë	Ë	Ë	b	b	b
`cache 3`	C	C	C	C	b	b	b	b	F	F	F	一個	一個	一個	Ë	Ë
快取未命中			X	X	X		X	X	X	X	X	X	X	X	X	X

十六個請求中的十三個快取未命中聽起來不是最理想的，讓我們嘗試使用另一個策略的相同示例：

示例（LFD）

讓快取大小為 k=3 初始快取 a,b,c 和請求 a,a,d,e,b,b,a,c,f,d,e,a,f,b,e,c：

請求	一個	一個	d	Ë	b	b	一個	C	F	d	Ë	一個	F	b	Ë	C
`cache 1`	一個	一個	d	Ë	Ë	Ë	Ë	Ë	Ë	Ë	Ë	Ë	Ë	Ë	Ë	C
`cache 2`	b	b	b	b	b	b	一個	一個	一個	一個	一個	一個	F	F	F	F
`cache 3`	C	C	C	C	C	C	C	C	F	d	d	d	d	b	b	b
快取未命中			X	X			X		X	X			X	X		X

八次快取未命中要好得多。

Selftest ：做 LIFO，LFU，RFU 的例子，看看發生了什麼。

以下示例程式（用 C++編寫）由兩部分組成：

骨架是一個應用程式，它解決了依賴於所選擇的貪婪策略的問題：

#include <iostream>
#include <memory>

using namespace std;

const int cacheSize     = 3;
const int requestLength = 16;

const char request[]    = {'a','a','d','e','b','b','a','c','f','d','e','a','f','b','e','c'};
char cache[]            = {'a','b','c'};

// for reset
char originalCache[]    = {'a','b','c'};

class Strategy {

public:
    Strategy(std::string name) : strategyName(name) {}
    virtual ~Strategy() = default;

    // calculate which cache place should be used
    virtual int apply(int requestIndex)                                      = 0;

    // updates information the strategy needs
    virtual void update(int cachePlace, int requestIndex, bool cacheMiss)    = 0;

    const std::string strategyName;
};

bool updateCache(int requestIndex, Strategy* strategy)
{
    // calculate where to put request
    int cachePlace = strategy->apply(requestIndex);

    // proof whether its a cache hit or a cache miss
    bool isMiss = request[requestIndex] != cache[cachePlace];

    // update strategy (for example recount distances)
    strategy->update(cachePlace, requestIndex, isMiss);

    // write to cache
    cache[cachePlace] = request[requestIndex];

    return isMiss;
}

int main()
{
    Strategy* selectedStrategy[] = { new FIFO, new LIFO, new LRU, new LFU, new LFD };

    for (int strat=0; strat < 5; ++strat)
    {
        // reset cache
        for (int i=0; i < cacheSize; ++i) cache[i] = originalCache[i];

        cout <<"\nStrategy: " << selectedStrategy[strat]->strategyName << endl;

        cout << "\nCache initial: (";
        for (int i=0; i < cacheSize-1; ++i) cout << cache[i] << ",";
        cout << cache[cacheSize-1] << ")\n\n";

        cout << "Request\t";
        for (int i=0; i < cacheSize; ++i) cout << "cache " << i << "\t";
        cout << "cache miss" << endl;

        int cntMisses = 0;

        for(int i=0; i<requestLength; ++i)
        {
            bool isMiss = updateCache(i, selectedStrategy[strat]);
            if (isMiss) ++cntMisses;

            cout << "  " << request[i] << "\t";
            for (int l=0; l < cacheSize; ++l) cout << "  " << cache[l] << "\t";
            cout << (isMiss ? "x" : "") << endl;
        }

        cout<< "\nTotal cache misses: " << cntMisses << endl;
    }

    for(int i=0; i<5; ++i) delete selectedStrategy[i];
}

基本想法很簡單：對於每個請求，我有兩個呼叫兩個我的策略：

apply ：策略必須告訴呼叫者使用哪個頁面
更新：呼叫者使用該位置後，它會告訴策略是否是未命中。然後策略可以更新其內部資料。例如，策略 LFU 必須更新快取頁面的命中頻率，而 LFD 策略必須重新計算快取頁面的距離。

現在讓我們看看我們的五個策略的示例實現：

FIFO

class FIFO : public Strategy {
public:
    FIFO() : Strategy("FIFO")
    {
        for (int i=0; i<cacheSize; ++i) age[i] = 0;
    }

    int apply(int requestIndex) override
    {
        int oldest = 0;

        for(int i=0; i<cacheSize; ++i)
        {
            if(cache[i] == request[requestIndex])
                return i;

            else if(age[i] > age[oldest])
                oldest = i;
        }

        return oldest;
    }

    void update(int cachePos, int requestIndex, bool cacheMiss) override
    {
        // nothing changed we dont need to update the ages
        if(!cacheMiss)
            return;

        // all old pages get older, the new one get 0
        for(int i=0; i<cacheSize; ++i)
        {
            if(i != cachePos)
                age[i]++;

            else
                age[i] = 0;
        }
    }

private:
    int age[cacheSize];
};

FIFO 只需要資訊頁面在快取中的長度（當然僅相對於其他頁面）。因此，唯一要做的就是等待錯過，然後製作頁面，這些頁面在沒有被驅逐的情況下更老。對於上面的示例，程式解決方案是：

Strategy: FIFO

Cache initial: (a,b,c)

Request    cache 0    cache 1    cache 2    cache miss
  a          a          b          c    
  a          a          b          c    
  d          d          b          c          x
  e          d          e          c          x
  b          d          e          b          x
  b          d          e          b    
  a          a          e          b          x
  c          a          c          b          x
  f          a          c          f          x
  d          d          c          f          x
  e          d          e          f          x
  a          d          e          a          x
  f          f          e          a          x
  b          f          b          a          x
  e          f          b          e          x
  c          c          b          e          x

Total cache misses: 13

這確切地說是上面的解決方案。

LIFO

class LIFO : public Strategy {
public:
    LIFO() : Strategy("LIFO")
    {
        for (int i=0; i<cacheSize; ++i) age[i] = 0;
    }

    int apply(int requestIndex) override
    {
        int newest = 0;

        for(int i=0; i<cacheSize; ++i)
        {
            if(cache[i] == request[requestIndex])
                return i;

            else if(age[i] < age[newest])
                newest = i;
        }

        return newest;
    }

    void update(int cachePos, int requestIndex, bool cacheMiss) override
    {
        // nothing changed we dont need to update the ages
        if(!cacheMiss)
            return;

        // all old pages get older, the new one get 0
        for(int i=0; i<cacheSize; ++i)
        {
            if(i != cachePos)
                age[i]++;

            else
                age[i] = 0;
        }
    }

private:
    int age[cacheSize];
};

實施 LIFO 是或多或少相同的 FIFO ，但我們退出的最年輕不是最古老的頁面。該計劃的結果是：

Strategy: LIFO

Cache initial: (a,b,c)

Request    cache 0    cache 1    cache 2    cache miss
  a          a          b          c    
  a          a          b          c    
  d          d          b          c          x
  e          e          b          c          x
  b          e          b          c    
  b          e          b          c    
  a          a          b          c          x
  c          a          b          c    
  f          f          b          c          x
  d          d          b          c          x
  e          e          b          c          x
  a          a          b          c          x
  f          f          b          c          x
  b          f          b          c    
  e          e          b          c          x
  c          e          b          c    

Total cache misses: 9

LRU

class LRU : public Strategy {
public:
    LRU() : Strategy("LRU")
    {
        for (int i=0; i<cacheSize; ++i) age[i] = 0;
    }

    // here oldest mean not used the longest
    int apply(int requestIndex) override
    {
        int oldest = 0;

        for(int i=0; i<cacheSize; ++i)
        {
            if(cache[i] == request[requestIndex])
                return i;

            else if(age[i] > age[oldest])
                oldest = i;
        }

        return oldest;
    }

    void update(int cachePos, int requestIndex, bool cacheMiss) override
    {
        // all old pages get older, the used one get 0
        for(int i=0; i<cacheSize; ++i)
        {
            if(i != cachePos)
                age[i]++;

            else
                age[i] = 0;
        }
    }

private:
    int age[cacheSize];
};

在 LRU 的情況下，策略獨立於快取頁面，其唯一的興趣是最後一次使用。該計劃的結果是：

Strategy: LRU

Cache initial: (a,b,c)

Request    cache 0    cache 1    cache 2    cache miss
  a          a          b          c    
  a          a          b          c    
  d          a          d          c          x
  e          a          d          e          x
  b          b          d          e          x
  b          b          d          e    
  a          b          a          e          x
  c          b          a          c          x
  f          f          a          c          x
  d          f          d          c          x
  e          f          d          e          x
  a          a          d          e          x
  f          a          f          e          x
  b          a          f          b          x
  e          e          f          b          x
  c          e          c          b          x

Total cache misses: 13

LFU

class LFU : public Strategy {
public:
    LFU() : Strategy("LFU")
    {
        for (int i=0; i<cacheSize; ++i) requestFrequency[i] = 0;
    }

    int apply(int requestIndex) override
    {
        int least = 0;

        for(int i=0; i<cacheSize; ++i)
        {
            if(cache[i] == request[requestIndex])
                return i;

            else if(requestFrequency[i] < requestFrequency[least])
                least = i;
        }

        return least;
    }

    void update(int cachePos, int requestIndex, bool cacheMiss) override
    {
        if(cacheMiss)
            requestFrequency[cachePos] = 1;

        else
            ++requestFrequency[cachePos];
    }

private:

    // how frequently was the page used
    int requestFrequency[cacheSize];
};

LFU 驅逐頁面使用最少。所以更新策略只是計算每次訪問。當然，在計數失敗後，計數重置。該計劃的結果是：

Strategy: LFU

Cache initial: (a,b,c)

Request    cache 0    cache 1    cache 2    cache miss
  a          a          b          c    
  a          a          b          c    
  d          a          d          c          x
  e          a          d          e          x
  b          a          b          e          x
  b          a          b          e    
  a          a          b          e    
  c          a          b          c          x
  f          a          b          f          x
  d          a          b          d          x
  e          a          b          e          x
  a          a          b          e    
  f          a          b          f          x
  b          a          b          f    
  e          a          b          e          x
  c          a          b          c          x

Total cache misses: 10

LFD

class LFD : public Strategy {
public:
    LFD() : Strategy("LFD")
    {
        // precalc next usage before starting to fullfill requests
        for (int i=0; i<cacheSize; ++i) nextUse[i] = calcNextUse(-1, cache[i]);
    }

    int apply(int requestIndex) override
    {
        int latest = 0;

        for(int i=0; i<cacheSize; ++i)
        {
            if(cache[i] == request[requestIndex])
                return i;

            else if(nextUse[i] > nextUse[latest])
                latest = i;
        }

        return latest;
    }

    void update(int cachePos, int requestIndex, bool cacheMiss) override
    {
            nextUse[cachePos] = calcNextUse(requestIndex, cache[cachePos]);
    }

private:

    int calcNextUse(int requestPosition, char pageItem)
    {
        for(int i = requestPosition+1; i < requestLength; ++i)
        {
            if (request[i] == pageItem)
                return i;
        }

        return requestLength + 1;
    }

    // next usage of page
    int nextUse[cacheSize];
};

該 LFD 策略是大家以前不同。它是唯一使用未來要求驅逐的人的策略。該實現使用函式 calcNextUse 來獲取下一次使用最遠的頁面。程式解決方案等同於上面的解決方案：

Strategy: LFD

Cache initial: (a,b,c)

Request    cache 0    cache 1    cache 2    cache miss
  a          a          b          c    
  a          a          b          c    
  d          a          b          d          x
  e          a          b          e          x
  b          a          b          e    
  b          a          b          e    
  a          a          b          e    
  c          a          c          e          x
  f          a          f          e          x
  d          a          d          e          x
  e          a          d          e    
  a          a          d          e    
  f          f          d          e          x
  b          b          d          e          x
  e          b          d          e    
  c          c          d          e          x

Total cache misses: 8

貪婪策略 LFD 確實是所提出的五種策略中唯一的最佳策略。證據相當長，可以在這裡或 Jon Kleinberg 和 Eva Tardos 的書中找到（參見下面的評論中的來源）。

演算法與現實

該 LFD 策略是最優的，但有一個很大的問題。它是最佳的離線解決方案。在實踐中，快取通常是一個線上問題，這意味著策略是無用的，因為我們現在不能在下次需要特定專案時使用。其他四種策略也是線上策略。對於線上問題，我們需要一種不同的方法。