Browsing the papers available on the Papers We Love community, looking for nothing in particular, I found an interesting one entitled “An O(1) algorithm for implementing the LFU cache eviction scheme”.
Contrary to Least Recently Used (LRU), a Least Frequently Used (LFU) keeps track of how many times a given entry was used, instead of just saying what item has the oldest use timestamp. I liked how simple the algorithm is, using a map and a pair of linked lists to maintain the entries’ frequencies. It becomes very evident how those structures are employed once you see the figures available on the paper.
As an exercise, I decided to implement it using Scala as a mutable data structure.
I made it a generic class, where
Item is the type being cached, and
Id the type used as key for that item type. The ids are provided through an implementation of the
Identifier interface. It allow you to create custom identification mechanisms. Most commonly though, you might just use a hash code of the item, but if you might be caching web content, it might make sense to use the resource’s URL as the id.
You also specify the maximum number of items you are willing to accept. As an improvement, this could also be done more dynamic, by passing it an object that should tell if an item should be dropped or not. With that, you could decide to accept new items as long as you have free memory.
Furthermore, it could be changed to also provide a standard Scala API, like
Map, for instance.
The class signature looks like the following:
As described on the paper, the collection supports 3 operation: insertions, lookups, and deletions. On the code, they map to the methods on the code above, following the aforementioned order. Inserting or looking up an item increases it frequency count. When you first insert an item, as expected, its frequency is equal to 1.
Internally, also based on the paper, items are kept on a linked list with all items that share the same frequency as that one (
CachedItem). Those linked lists are kept on a second linked list (
FrequencyList), where each entry is a list of items of frequency n. Finally, a map is used to associate a key to their respective link on the inner linked list (
An example using a cache of integers would be:
- Efficient Timer using a Circular Buffer
- Proxy annotated objects with Guice
- Services Version Lock with Docker and Jenkins
- Type Equality Checking at Compile Time in Scala
- RFID, Dryers, and IoT
- Circuit Breakers and retries in Scala with autobreaker
- Integration testing for nginx Routes
- Problems with Branches per Environment
- Trunk Based Development with Multiple Services