SwarmCache - Cluster-aware Caching for Java

SwarmCache

Cluster-aware Caching for Java

Please be sure to read the Tutorial section before reading this section.

Caching Algorithms
The following caching algorithms are available for the local caches:

Least Recently Used (LRU)
This is the most common and straightforward caching algorithm. There is a fixed amount of objects that can be cached, and this is configurable. When the caching of a new object is requested but the cache is full, the object that has least recently been requested from the cache (or updated in the cache) is ejected to make space.

Automatic
This algorithm works with the JVM's garbage collector to allow cached objects to be automatically garbage-collected. The cache maintains soft references to cached objects. These allow the cached objects to be garbage collected by the system as necessary. This allows for a cache size that is bounded only by the amount of memory available to the JVM. However, unlike the LRU algorithm, it does not necessarily guarantee that frequently accessed objects will always be available in the cache.

Timeout
This algorithm times out objects that have been untouched for a number of milliseconds. The timeout amount is specified via the property cache.timeout.

Hybrid
This combines the benefits of both the LRU and Automatic algorithms. Two levels of caching are provided. At the first level, an LRU cache ensures that a fixed number of the most recently used objects are available in the cache. At the second level, an Automatic cache holds cached objects that are available to be reclaimed by the garbage collector.

Which one should I use?
The Hybrid algorithm offers the best combination of guaranteed performance for frequently used objects and overall cache size. It is the recommended algorithm. For those developers who want simple, deterministic caching, the LRU algorithm may be preferable.

Configuration Details
The following are all the configuration options available via the CacheConfiguration class:

CacheType
This is the type of the underlying caches that will be used. The options are CacheConfiguration.TYPE_LRU, CacheConfiguration.TYPE_AUTO, CacheConfiguration.TYPE_TIMER and CacheConfiguration.TYPE_HYBRID. The default is LRU.

LRUCacheSize
This is the LRU cache size. This measured in number of objects, not bytes. This value is ignored if CacheType is set to CacheConfiguration.AUTO or CacheConfiguration.TIMER. The default is 10000.

MulticastIP
This is the multicast IP address that will be used to communicate between cache managers. The default value is 231.12.21.132.

ChannelProperties
If you are familiar with JavaGroups, then this allows you to directly set the properties for the JavaGroups channel. Note that setting this property will cause the value of MulticastIP to be ignored. If you are interested, see the Javadocs for the default value.

Writing Cache-Aware Applications
Now that you have this fabulous caching engine built in to your web application, how do you make the most of it? Here are some tips:

Think about what data comprises 90% of all requests.
In the vast majority of web applications, at least 90% of all requests that arrive at the persistence engine are for the same small set of data. So this is the data you want to ensure you have cached. Don't worry so much about that other 10%.

Do not cache compound objects!
Suppose we had a method in the above persistence example called getAllPeople, that returned all people in the database. Should we cache the result of that query? It is often an unwise idea to do so. The problem is that one of those People objects could be updated and yet the cache of the getAllPeople query would not be expired. Of course, you could add extra code to insert, update and delete to clear the compound object from the cache, but it will quickly get messy.

Do not be afraid of single selects.
Suppose again that we are running code similar to the persistence example above. Now that objects are being cached efficiently, it is in our advantage to call the get often. Don't be afraid to replace database joins with mutliple calls to get(long key). This will also result in a more object-oriented use of your data model. As an example of this, suppose you had a one-to-many relationship between an object House and People. This modeled the fact that many people could live in a house. If you had the ID of a Person and wanted to select it and the House in which it lived, you may be tempted to use a join for this. Instead, perform a simple select for the Person with the get method, and then do the same for the House once you have its ID.

Have a means to clear all caches.
If you ever want to make direct database changes, it is often very handy to have a web-accessible admin screen that allows some or all of the caches to be manually cleared.

Credits
SwarmCache was written by John Watkinson (john at autobotcity dot com).
Code and ideas were contributed by Jason Carreira, Rajeev Kaul and André Schild. SwarmCache uses JavaGroups and Apache Jakarta Commons. The project is hosted by SourceForge.