Friday 31 October 2014

Soft-Index File Store

Recently, Infinispan got a new local file-based cache store, called Soft-Index File Store. Why have we created just another cache store, what problems is it solving, what are its limitations and how is it designed?

Single File Store is a well performing cache store, but it stores all keys in-memory; that limits the number of keys you can store. File fragmentation could be even more of an issue: if you store larger and larger values (that happens quite a lot, as users e.g. add stuff into their shopping carts), the space is not reused and instead the entry is appended at the end of the file. The space (now empty) is reused only if you write entry that can fit there. Also, even if you remove all entries from the cache, the file won't shrink, and neither won't be de-fragmented.

LevelDB uses quite well performing Google's library written in native code. The major drawback is the native code - if LevelDB has a bug that ends in segfault, whole JVM crashes, bringing you application server down.

Our new Soft Index File Store is pure Java implementation that tries to get around Single File Store's drawbacks by implementing a variant of B+ tree that is cached in-memory using Java's soft references - here's where the name Soft Index File Store comes from. This B+ tree (called Index) is offloaded on filesystem to single file: in fact, this has theoretically similar problems with fragmentation as Single File Store - but in practice it shouldn't cause such problems. This index file does not need to be persisted - it is purged and rebuilt when the cache store restarts, its purpose is only offloading.

The data that should be persisted are stored in a set of files that are written in append-only way - that means that if you store this on conventional magnetic disk, it does not have to seek when writing a burst of entries. It is not stored in a single file but in a set of files. When any of these files drops below 50% of usage (the entries are marked as removed or overwritten), the file starts being collected, moving live entries into another file and in the end removing the old file from disk.

Most of the in-memory structures in Soft Index File Store are bounded, therefore you don't have to be afraid of OOMEs. You can also configure the limits for concurrently open files as well (so that you don't run out of file descriptors).

How to configure SIFS

The configuration is no different from regular cache store:

Implementation details

The Index does not use single file, in fact it can be split into multiple segments. That's because the algorithm updating this B+ tree is designed as single writer - multiple readers, but that could make the writer thread (called 'Index Updater') the bottleneck. Therefore, you can set how many segments should the Index be split into (according to keys' hashCode()).

Each node in the Index stores 'prefix' of all keys (or rather the serialized forms) used in the node in order to reduce the space required for the node. This comes with the assumption that the prefixes are often similar (e.g. when you use key "user000001" and "user000002"). If you can change how the keys are serialized, it is encouraged to move the changing part of the key to the end of the serialized data.

The data are written by single thread as well, the 'Log Appender'. There's no reason to let threads that access the cache store compete over file-system - Log Appender queues the write results, writes them into the file and wakes up the waiting thread. There are 2 possibly unnecessary context-switches, but in the original design we wanted to allow the write request to return only after the data have been fsynced. By batching the writes, Log Appender allows this as a configuration option - then you can be sure that the data are already on disk when the call returns.

When the entry is modified, the Index needs to be updated. The request is sent to Index Updater via bounded queue and the newest entry location is stored in Temporary Table until this is stored in the Index. The updated nodes are eventually offloaded onto disk in this way.

Known limitations

Size of a node in the Index is limited, by default it is 4096 bytes, though it can be configured. This size also limits the key length (or rather the length of the serialized form): you can't use keys longer than size of the node - 15 bytes. Moreover, the key length is stored as 'short', limiting it to 32767 bytes. There's no way how you can use longer keys - SIFS throws an exception when the key is longer after serialization.

When entries are stored with expiration, SIFS does not discover the a file is full of expired entries and the compaction of old data files may not be started ever (method AdvancedStore.purgeExpired() is not implemented). This can lead to excessive file-system space usage.

Future work

What we need to do know is to benchmark SIFS in many configurations and set the optimal values as defaults. However, we run mostly synthetic benchmarks - and that's where you can help. Let's play with Soft Index File Store a bit and tell us what configuration works best for you!

For storing large keys, building the B+ tree of hashCodes could perform better that storing the whole keys, though it would need additional handling for collisions. Tell us what keys do you use, please!

Currently, each index update needs to be eventually stored, and that means one or more writes into the file-system even when this is not necessary. In the future, we might try to use phantom references instead of soft references to write the Index only when it needs to release some memory. However, this requires a lot of further work, so test SIFS today and let us now how do you like it!

Tuesday 28 October 2014

Infinispan HotRod .NET Client 7.0.0.CR2

Dear community,

Infinispan HotRod .NET Client 7.0.0.CR2 is now available.

This is mostly a bug-fix release.For the complete list of changes please consult the release notes (includes also the changes from the corresponding version of the C++ Client).
 
Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Thanks to everyone involved for the changes and bug reports contributed!

Monday 27 October 2014

Infinispan HotRod C++ Client 7.0.0.CR2

Dear community,

Infinispan HotRod C++ Client 7.0.0.CR2 is now available.

This is mostly a bug-fix release. I would like bring to your attention the following changes:
For the complete list of changes please consult the release notes.
Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Thanks to everyone involved for the changes and bug reports contributed!

Friday 24 October 2014

Cross-Site Replication: state transfer is here!

Hello community.

Since the initial release of Cross-Site Replication, the state transfer between sites was really needed. When a new site is brought online, there was not way to synchronize the data between them. Finally, these days are over and it is possible synchronize geographically replicated sites. How to use is described in Infinispan's Manual.

For the curious, the solution is described here.

Any question can be asked in the forum, mailing list or directly with us in the IRC. If you found a bug please report it in here.

Happy coding, fellows.

Infinispan Team.

Monday 13 October 2014

Infinispan at JavaOne!

I've made for the first time to JavaOne - an impressive event not only through it's scale but also the variety and quality of technical talks. Infinispan was just one of the six in-memory grid providers presents at the conference, which shows the increased demand for this technology. It was also a great opportunity for me to meet our community and  show
our project to the world. Speaking of which, a crash course into Infinispan recorded from JavaOne:


Tuesday 7 October 2014

Infinispan 7.0.0.CR1 is out!

Dear Community,

We are gearing up towards a great Infinispan 7.0.0, and we are happy to announce our first candidate release!

Notable features and improvements in this release:

  • Cross-site state transfer now handles failures  (ISPN-4025)
  • Easier management of Protobuf schemas (ISPN-4357)
  • New uberjars-based distribution (ISPN-4728)
  • The HotRod protocol and Java client now have a size() operation (ISPN-4736)
  • Cluster listeners' filters and converters can now see the old value and metadata (ISPN-4753)
  • A new and promising file store implementation that addresses the scalability issues of our single-file store (ISPN-3921, thanks Radim!)

For a complete list of features and bug fixes included in this release please refer to the release notes.  Visit our downloads section to find the latest release.

If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,
The Infinispan team