Thursday 18 December 2014

Infinispan 7.0.2.Final is a certified JSR-107 1.0 implementation

The infinispan-jcache module in Infinispan 7.0.2.Final has been certified to be compatible JSR-107 1.0 specification implementation. To get started with Infinispan's JSR-107 implementation, check the Infinispan documentation section on the topic, and remember that Infinispan also implements the JSR-107 annotations for CDI injection of cached values.

Thanks to everyone who has contributed to the module, and to Greg Luck and Brian Oliver for their help in completing the certification.

Cheers,
Galder

Monday 15 December 2014

Hot Rod Remote Events #4: Clustering and Failover

This blog post is the last in a series that looks at the forthcoming Hot Rod Remote Events functionality included in Infinispan 7.0. First article focused on how to get started receiving remote events from Hot Rod servers. The second article looked at how Hot Rod remote events can be filtered, and the third one showed how to customize contents of events.

In this last article, we'll be focusing on how remote events are fired in a clustered environment and how failover situations are dealt with.

The most important thing to know about remote events in a clustered environment is that when a client adds a remote listener, this is installed in a single node in the cluster and that this node is in charge of sending events back to the client for all affected operations happening cluster wide.

As a result of this, when filtering or event customization is applied, the org.infinispan.notifications.cachelistener.filter.CacheEventFilter and/or org.infinispan.notifications.cachelistener.filter.CacheEventConverter instances must be somehow marshallable. This is necessary because when the client listener is installed in a cluster, the filter and/or converter instances are sent to other nodes in the cluster so that filtering and conversion can happen right where the event originates, hence improving efficiency. These classes can be made marshallable by making them extend Serializable, or providing and registering a custom Externalizer for them.

Under normal circumstances, the code and examples showed in previous blog posts work the same way in clustered environment. However, in a clustered environment, a decision needs to be made with regards to how to deal with situations where nodes go down: If a node goes down that does not have the client listener installed, nothing happens. However, when the node containing the client listener goes down, the Hot Rod client implementation transparently fails over the client listener registration to a different node. As a result of this failover, there could be a gap in the event consumption. This gap is solved using one of these solutions:

State Delivery


The @ClientListener annotation has an optional parameter called includeCurrentState. When this is enabled and the client listener is registered, before receiving any events for on-going operations, the server sends ClientCacheEntryCreatedEvent event instances (for methods annotated with @ClientCacheEntryCreated) for all existing cache entries to the client. This offers the client an opportunity to construct some state or computation based on the contents of the clustered cache. When the Hot Rod client transparently fails over registered listeners, it re-registers them in a different node and if includeCurrentState is enabled, clients can recompute their state or computation to reinstate it to what it was before the failover. The downside of includeCurrentState is that it's performance is heavily dependant on the cache size, and hence it's disabled by default.

@ClientCacheFailover


Alternatively, instead of relying on receiving state, users can define a method with @ClientCacheFailover annotation that receives ClientCacheFailoverEvent as parameter inside the client listener implementation:


This method would be called back whenever the node that had this client listener has gone down. This can be handy for situations when the end users just wants to clear up some local state as a result of the failover, e.g. clear a near or L1 cache. When events are received again, the near or L1 cache could be repopulated again.

This callback method of dealing with client listener failover offers a simple, efficient solution to dealing with cluster topology changes affecting client listeners. Depending on the remote event use case, this method might be better suited that state delivery.

Final Words


This post marks the end of the remote event series. In future Infinispan versions, we'll continue improving the technology adding some extra features, and more importantly, we'll start building higher level abstractions on top of remote events, such as Hot Rod client Near Caches.

Cheers,
Galder

Friday 5 December 2014

Infinispan 7.1 Codename: And the winner is...

Dear Infinispan community,

the scrutiny is complete and YOU have decided the name of the next Infinispan release, which is going to be:

Hoptimus Prime

The following is a breakdown of the votes:

Hoptimus Prime30%
Insanely Bad Elf28%
Aventinus23%
Amber Shock14%
Drake's Hopocalypse5%

Since Hoptimus Prime has been retired we will have to celebrate by drinking one of the other wonderful runners-up. 

Remember that, aside from choosing codenames, you can also contribute to Infinispan in many ways: code, documentation, bug reports, helping on the forum, discussing ideas on the mailing list or just chat with us on #infinispan on Freenode's IRC servers.

Friday 28 November 2014

Infinispan 7.1 Codename

It's that time of the development cycle when we get to decide the most important feature that will be part of our next release: the codename.

As usual, it must be chosen among a selection of fermented barley, hops, yeast and water aka beer.

A bit of history for reference:

4.0 Starobrno
4.1 Radegast
4.2 Ursus
5.0 Pagoa
5.1 Brahma
5.2 Delirium
5.3 Tactical Nuclear Penguin
6.0 Infinium
7.0 Guinness

You have until Friday, 5th December 2014 at 12:00 GMT to cast your vote
for your favourite, either in the form below or by going to the survey.

Thursday 27 November 2014

Infinispan 7.1.0 Alpha1 is out!

Dear Infinispan community,

We're proud to announce the first Alpha release of Infinispan 7.1.0.

This release adds the foundation for future features, optimization and many fixes.

Feel free to join us and shape the future releases on our forums, our mailing lists or our #infinispan IRC channel.

For a complete list of features and bug fixes included in this release please refer to the release notes. Visit our downloads section to find the latest release.

Thanks to everyone for their involvement and contribution!

Wednesday 19 November 2014

Infinispan 7.0.2.Final released!

Dear community,

Infinispan 7.0.2.Final is now available!

This release removes duplication from the service lookup metadata. Please consult the release notes for details.

Thanks to everyone involved in this release! 

Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Monday 17 November 2014

Infinispan 7.0.1.Final released!

Dear community,

Infinispan 7.0.1.Final is now available!

This is a bug-fix release and contains query performance improvements. For the complete list of changes please consult the release notes.

Thanks to everyone involved in this release! 

Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Tuesday 4 November 2014

Why doesn't Map.size return the size of the entire cluster?

Many people may have been surprised the first time they used Map.size method on a distributed Infinispan cluster.  As was later deduced only the local node size is returned.

Infinispan had taken this approach to limit the chance that instead of getting the full cluster size you would receive an OutOfMemoryError.  This seems fair to return the local answer only but you secretly always wanted the entire cluster size.

For the Infinispan 7.0.0.Final release forget what you know when using the Map interface with Infinispan.

Enter Distributed Entry Iterator

We already announced this feature a while back at http://blog.infinispan.org/2014/05/iterate-all-entries-in-cache.html.  You can check it out for more details but it is essentially a memory efficient way of retrieving all the entries in the cache by iterating over them.

This has opened the way to implementing the various bulk methods on the Map interface that we could never do efficiently in the past (ie. Map.size, Map.keySet, Map.entrySet & Map.values).

Map size

Okay I admit, size could have been done more efficiently before, but the answer would have contained a very high margin of error.  Now size will give you a size value with consistency semantics just slightly less than ConcurrentHashMap does, but for the entire cluster.  Warning should be given that size may be slower for larger clusters or ones with a lot of data in a stored cache loader.

The size method behavior can be controlled by using a supplied Flag such as SKIP_CACHE_LOAD to not count any configured cache loaders or CACHE_MODE_LOCAL if you want the local count only.  These flags are not exclusive and can be both passed if desired as well.

Map Collection Views

In the past the Map.values, Map.keySet & Map.entrySet methods were only ever in memory copies of the local data at the time they were invoked, similar to Map.size.

Now these collection views will be cluster wide and an additionally will show updated contents when the cache is updated and your writes to the collection will be reflected in the Cache itself!  The only operations you can't do on these collections are adding values to either the keySet or values collections as they aren't key/value pairs.

If your cache was configured with a Flag such as SKIP_CACHE_LOAD or CACHE_MODE_LOCAL it will also be reflected in the collection view for both reads and writes.

Some caution is advised when using toArray, retainAll, or size methods as they will require full iteration to complete.

KeySet Optimization

The key set collection also has an optimization so it will never pull down the values so it has a lower network and serialization/deserialization overhead (unlike entrySet and values).

Transactionality

All of the aforementioned methods still support transactions in a way that you would expect.  There is one guarantee we don't provide and that is when using REPEATABLE_READ isolation.  We will not store entries read from an iterator in the transactional context as this could very easily run your local node out of memory with a large enough data set.

For reference methods that use an iterator internally are toArray, retainAll, isEmpty & size on the various collections as well as contains and containsAll on the values collection.

Other API Changes

These changes have also loosened some restrictions on other methods as well.

Map.isEmpty

This method before was only used locally as it used to calculate the size to determine if it was empty.  This method will now use the entry iterator and returns as soon as it finds that even a single value exists.  This is an important change as the old implementation would have to query any configured Cache Loader's complete size before returning.

Map.containsValue

We never supported this method before (not even locally).  This method will now use the iterator though and if it finds the value at any point point will return immediately so it doesn't have to iterate over the entire contents unless they don't exist.  However if you really want to do this operation often you should really use Indexing to make this faster.

Code Examples

I could put an example here, but I think some could take it as insult.  You have already seen 100's of examples as to how to use the Map interface and now in Infinispan you can use those in the exact same way and they will work just how you would expect them to.

Infinispan 7.0.0.Final is out!!

Hi all,

We are really proud to announce the release of Infinispan 7.0.0.Final!!

This is the culmination of several months of development which has focused on on Security, Cluster Partition handling, JSR-107 JCache 1.0.0 support, Clustered Listeners, Remote Events, Query improvements and brand new XML configuration.

To mark the occasion, the team has prepared a thorough release notes page highlighting all the major features and enhancements implemented in Infinispan 7.0 series.

The Infinispan team would like to recognise all the community members that have contributed to this release, in no particular order:

  • Radim Vansa for his Soft-Index File Store and many more enhancements and fixes
  • Takayoshi Kimura for fixes such as ISPN-3752ISPN-4476 and ISPN-4477
  • Jiri Holusa for his tremendous work to improve our test coverage work and fixing issues like ISPN-3442 
  • Karl von Randow for his documentation fixes, init.d fixes in ISPN-4141 and enhancements to putForExternalRead method as part of ISPN-3792
  • Jakub Markos for his work to optimise the Infinispan Server testsuite in ISPN-4317 and many fixes and test suite enhancements
  • Michal Linhard fox his ISPN-3750 fix
  • Vitalii Chepeliuk for his work on extending test coverage and fixes such as ISPN-3880
  • Wolf Dieter-Fink for fixes such as ISPN-3916 and ISPN-3912
  • Vojtech Juranek for his continued work to improve Infinispan with fixes such as ISPN-4072 and his work to increase the test coverage
  • Martin Gencur for the many issues he fixed including ISPN-3771ISPN-4499 and others...
  • Norman Maurer for porting Infinispan Servers to use Netty 4
  • Alan Field for fixes such as ISPN-4645 and ISPN-4376
  • Tomáš Sýkora for fixes such as ISPN-3136 and ISPN-4076, and improved test coverage
  • Paul Ferraro for many fixes including fixes such as ISPN-4375 and ISPN-4374
  • Nicolas Filotto for his ISPN-3689 fix
  • Rajesh Jangam for his ISPN-3877 and ISPN-3894 fixes
  • Brett Meyer for his amazing work to get Infinispan working in OSGI environments as part of ISPN-800 and many related fixes
  • Radoslav Husar for his several fixes
  • Sebastian Łaskawiec for his work to improve our CDI integration and moving to Jackson for JSON
  • Karsten Blees for his LIRS eviction fixes
  • Niels Bertramn for his ISPN-4679 fix
  • Duncan Doyle for his work on ISPN-4637
  • Emmanuel Bernard for his documentation improvements
  • Gabriel Francisco for his work to revamp the Mongo DB cache store
  • Bilgin Ibryam for his OSGI fixes
  • Erik Salter for his work on orphaned transactions and fixes such as ISPN-4872
Thanks to all contributors for your amazing work and effort! We hope you carry on contributing in future releases.

Finally, during the Infinispan 7.0 series, Gustavo Fernandes has joined the team making outstanding contributions in our Query project, and Tristan Tarrant has joined the team full time taking on Infinispan's Security layer. Thanks to both!!

Cheers,
Galder



Friday 31 October 2014

Soft-Index File Store

Recently, Infinispan got a new local file-based cache store, called Soft-Index File Store. Why have we created just another cache store, what problems is it solving, what are its limitations and how is it designed?

Single File Store is a well performing cache store, but it stores all keys in-memory; that limits the number of keys you can store. File fragmentation could be even more of an issue: if you store larger and larger values (that happens quite a lot, as users e.g. add stuff into their shopping carts), the space is not reused and instead the entry is appended at the end of the file. The space (now empty) is reused only if you write entry that can fit there. Also, even if you remove all entries from the cache, the file won't shrink, and neither won't be de-fragmented.

LevelDB uses quite well performing Google's library written in native code. The major drawback is the native code - if LevelDB has a bug that ends in segfault, whole JVM crashes, bringing you application server down.

Our new Soft Index File Store is pure Java implementation that tries to get around Single File Store's drawbacks by implementing a variant of B+ tree that is cached in-memory using Java's soft references - here's where the name Soft Index File Store comes from. This B+ tree (called Index) is offloaded on filesystem to single file: in fact, this has theoretically similar problems with fragmentation as Single File Store - but in practice it shouldn't cause such problems. This index file does not need to be persisted - it is purged and rebuilt when the cache store restarts, its purpose is only offloading.

The data that should be persisted are stored in a set of files that are written in append-only way - that means that if you store this on conventional magnetic disk, it does not have to seek when writing a burst of entries. It is not stored in a single file but in a set of files. When any of these files drops below 50% of usage (the entries are marked as removed or overwritten), the file starts being collected, moving live entries into another file and in the end removing the old file from disk.

Most of the in-memory structures in Soft Index File Store are bounded, therefore you don't have to be afraid of OOMEs. You can also configure the limits for concurrently open files as well (so that you don't run out of file descriptors).

How to configure SIFS

The configuration is no different from regular cache store:

Implementation details

The Index does not use single file, in fact it can be split into multiple segments. That's because the algorithm updating this B+ tree is designed as single writer - multiple readers, but that could make the writer thread (called 'Index Updater') the bottleneck. Therefore, you can set how many segments should the Index be split into (according to keys' hashCode()).

Each node in the Index stores 'prefix' of all keys (or rather the serialized forms) used in the node in order to reduce the space required for the node. This comes with the assumption that the prefixes are often similar (e.g. when you use key "user000001" and "user000002"). If you can change how the keys are serialized, it is encouraged to move the changing part of the key to the end of the serialized data.

The data are written by single thread as well, the 'Log Appender'. There's no reason to let threads that access the cache store compete over file-system - Log Appender queues the write results, writes them into the file and wakes up the waiting thread. There are 2 possibly unnecessary context-switches, but in the original design we wanted to allow the write request to return only after the data have been fsynced. By batching the writes, Log Appender allows this as a configuration option - then you can be sure that the data are already on disk when the call returns.

When the entry is modified, the Index needs to be updated. The request is sent to Index Updater via bounded queue and the newest entry location is stored in Temporary Table until this is stored in the Index. The updated nodes are eventually offloaded onto disk in this way.

Known limitations

Size of a node in the Index is limited, by default it is 4096 bytes, though it can be configured. This size also limits the key length (or rather the length of the serialized form): you can't use keys longer than size of the node - 15 bytes. Moreover, the key length is stored as 'short', limiting it to 32767 bytes. There's no way how you can use longer keys - SIFS throws an exception when the key is longer after serialization.

When entries are stored with expiration, SIFS does not discover the a file is full of expired entries and the compaction of old data files may not be started ever (method AdvancedStore.purgeExpired() is not implemented). This can lead to excessive file-system space usage.

Future work

What we need to do know is to benchmark SIFS in many configurations and set the optimal values as defaults. However, we run mostly synthetic benchmarks - and that's where you can help. Let's play with Soft Index File Store a bit and tell us what configuration works best for you!

For storing large keys, building the B+ tree of hashCodes could perform better that storing the whole keys, though it would need additional handling for collisions. Tell us what keys do you use, please!

Currently, each index update needs to be eventually stored, and that means one or more writes into the file-system even when this is not necessary. In the future, we might try to use phantom references instead of soft references to write the Index only when it needs to release some memory. However, this requires a lot of further work, so test SIFS today and let us now how do you like it!

Tuesday 28 October 2014

Infinispan HotRod .NET Client 7.0.0.CR2

Dear community,

Infinispan HotRod .NET Client 7.0.0.CR2 is now available.

This is mostly a bug-fix release.For the complete list of changes please consult the release notes (includes also the changes from the corresponding version of the C++ Client).
 
Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Thanks to everyone involved for the changes and bug reports contributed!

Monday 27 October 2014

Infinispan HotRod C++ Client 7.0.0.CR2

Dear community,

Infinispan HotRod C++ Client 7.0.0.CR2 is now available.

This is mostly a bug-fix release. I would like bring to your attention the following changes:
For the complete list of changes please consult the release notes.
Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Thanks to everyone involved for the changes and bug reports contributed!

Friday 24 October 2014

Cross-Site Replication: state transfer is here!

Hello community.

Since the initial release of Cross-Site Replication, the state transfer between sites was really needed. When a new site is brought online, there was not way to synchronize the data between them. Finally, these days are over and it is possible synchronize geographically replicated sites. How to use is described in Infinispan's Manual.

For the curious, the solution is described here.

Any question can be asked in the forum, mailing list or directly with us in the IRC. If you found a bug please report it in here.

Happy coding, fellows.

Infinispan Team.

Monday 13 October 2014

Infinispan at JavaOne!

I've made for the first time to JavaOne - an impressive event not only through it's scale but also the variety and quality of technical talks. Infinispan was just one of the six in-memory grid providers presents at the conference, which shows the increased demand for this technology. It was also a great opportunity for me to meet our community and  show
our project to the world. Speaking of which, a crash course into Infinispan recorded from JavaOne:


Tuesday 7 October 2014

Infinispan 7.0.0.CR1 is out!

Dear Community,

We are gearing up towards a great Infinispan 7.0.0, and we are happy to announce our first candidate release!

Notable features and improvements in this release:

  • Cross-site state transfer now handles failures  (ISPN-4025)
  • Easier management of Protobuf schemas (ISPN-4357)
  • New uberjars-based distribution (ISPN-4728)
  • The HotRod protocol and Java client now have a size() operation (ISPN-4736)
  • Cluster listeners' filters and converters can now see the old value and metadata (ISPN-4753)
  • A new and promising file store implementation that addresses the scalability issues of our single-file store (ISPN-3921, thanks Radim!)

For a complete list of features and bug fixes included in this release please refer to the release notes.  Visit our downloads section to find the latest release.

If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,
The Infinispan team

Thursday 25 September 2014

Cache and Cache Manger events in CDI

A long time ago, in a coffee bar far, far away, Infinispan met CDI. The two had the most amazing espressos, but they noticed that service was not as efficient as they wished. To help them out, the CDI support has been extended to include CDI Events.

Coffee Events


In case you haven't heard about CDI events, here is a really quick example:

When Waiter receives an order - he fires a CDI event. On the other hand Barista acts as a listener for ordered coffees (@CoffeeOrdered and @Observes). As a result Barista and Waiter are loosely coupled and moreover they don't know anything about each other.

Cache based Coffee Events


Now let's complicate this situation a little bit... Let's assume that when Waiter is passing an order to Barista, he might be actually busy processing another order. So let's introduce a little buffer between them - Waiter puts an order into the Cache and later on - Barista takes it and prepares our delicious coffee...


Beyond good espressos


As you can see - introducing CDI improved the service a lot. Now Waiter does not hurry Barista with the orders. This is why they serve the best espresso in the world there...

They have also a lot more time to think about other improvements (and to be honest... I think they will introduce CacheEntryModifiedEvent, CacheEntryRemovedEvent and CacheStartedEvent really shortly)... Or perhaps they'll find some other ideas in Infinispan's manual?

Wednesday 17 September 2014

Hot Rod Remote Events #3: Customizing events

This blog post is the third in a series that looks at the forthcoming Hot Rod Remote Events functionality included in Infinispan 7.0. In the first article we looked at how to get started receiving remote events from Hot Rod servers. In the second article, we saw how Hot Rod remote events can be filtered providing key/value filter factories that can create instances that filter which events are sent to clients, and how these filters can act on client provided information.

This time we are going to focus on how to customize events sent to clients. Events generated by default contain just enough information to make the event relevant but avoid cramming too much information in order to reduce the cost of sending them. Normally, this information consists of key and type of event.

Optionally, the information shipped in these events can be customized in order to contain more information, such as values, or to contain even less information. This customization is done with org.infinispan.notifications.cachelistener.filter.CacheEventConverter instances which are created by implementing a org.infinispan.notifications.cachelistener.filter.CacheEventConverterFactory class. Each factory must have a name associated to it via the org.infinispan.filter.NamedFactory annotation.

When a listener is added, we can provide the name of a converter factory to use with this listener, and when the listener is added, the server will look up the factory and invoke getConverter method to get a org.infinispan.notifications.cachelistener.filter.CacheEventConverter class instance to customize events server side.

Here's a sample implementation which will send custom events containing value information back to clients for a cache of Integers and Strings:

In the example above, the converter generates a new custom event which includes the value as well as the key in the event. This will result in bigger event payloads compared with default events, but if combined with filtering, it can reduce its network bandwidth cost.

In another converter implementation, the user could decide to send back an event that contains no key or event type information. This would result in extremely lightweight events at the expense of richness of information provided by the event itself.

Plugging the server with this converter requires deploying this converter factory (and associated converter class) within a jar file including a service definition inside the META-INF/services/org.infinispan.notifications.cachelistener.filter.CacheEventConverterFactory file:

With the server plugged with the converter, the next step is adding a remote client listener that will use this converter. How to implement a listener for custom events is slightly different to the listeners we've seen in the last couple of blog posts because we know have to deal with customised events as opposed to the default ones. To do so, the same annotations are used as previous blog posts, but the callbacks receive instances of org.infinispan.client.hotrod.event.ClientCacheEntryCustomEvent<T>, where T is the type of custom event we are sending from the server:

Now it's time to write a simple main java class which adds the remote event listener and executes some operations against the remote cache:

Once executed, we should see a console console output similar to this:

Similar to events, converters can also act on client provided information, enabling converter instances to customize events depending on the information given when the listener was added. The API provides an extra parameter to pass in converter parameters when the listener is added. Given the similarities with filtering, this part is not covered by this blog post.

A final note on the marshalling aspects of this example. In order to facilitate both server and client writing against type safe APIs, both the client and server need to be aware of custom event type and be able to marshall it. Client side, this is done by an optional marshaller configurable via the RemoteCacheManager. Server side, this is done by a marshaller recently added to the Hot Rod server configuration.

In the next blog post in the Hot Rod remote events series, we will look at how to receive remote events in a clustered environment, how to deal with failover situations...etc.

Cheers,
Galder

Tuesday 16 September 2014

Infinispan 7.0.0.Beta2 is out!


Dear Infinispan Community,

We are happy to announce the second Beta release of Infinispan 7.0.0!

This release brings many improvements and fixes:


  • Many fixes and performance optimizations for non-indexed queries (ISPN-4670, ISPN-4700)
  • Significant improvements to the reliability of indexed (i.e. Lucene based) queries:
    • InfinispanIndexManager reworked to handle locking on topology changes (ISPN-4599)
    • MassIndexer 20x performance improvement (ISPN-4644)
    • Some race conditions fixed in the Lucene Directory (ISPN-2981)
    • Fixed serialization of indexing messages under high load (ISPN-4573)
    • Resolved a race condition in (indexed) Cache initializations (ISPN-4719)
    • Improved classloading when run in containers (ISPN-4226, ISPN-4667)
    • Fixed JBoss modules to use Externalizers when run in containers (ISPN-4685)
    • (and many more minor improvements)
  • ISPN-4574 - Partition handling improvements for replicated caches and distributed caches with numOwners > cluster size / 2
  • ISPN-4646 - Eviction performance improvements, thanks to Karsten Blees


For a complete list of features and bug fixes included in this release please refer to the release notes.  Visit our downloads section to find the latest release.

If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,

The Infinispan team.

Monday 25 August 2014

Partitioned clusters tell no lies!

The problem

You are happily running a 10-node cluster. You want failover and speed and are using distributed mode with 2 copies of data for each key (numOwners=2). But disaster strikes: a switch in your network crashes and 5 of your nodes can't reach the other 5 anymore ! Now there are two independent clusters, each containing 5 nodes, which we are smartly going to name P1 and P2. Both P1 and P2 continue to serve user requests (puts and gets) as usual.

This cluster split in two or more parts is called partitioning or split brain. And it's bad for business, as in really bad ! Bob and Alice share a bank account stored in the cache. Bob updates his account on P1, then Alice reads it from P2: she sees a stale value of Bob's account (or even no value for Bob's account, depending on how the split looks like). This is a consistency issue, as there's an inconsistent view of the data between the two partitions.

Our solution

In Infinispan 7.0.0.Beta1 we added support for reacting to split brains: if nodes leave, Infinispan acknowledges that data might have been lost and denies user access to such data. We won't deny access to all the data, but just the data that might have been affected by the partitioning. Or, more formally: Infinispan sacrifices data availability in order to offer consistency (PC in Brewer's CAP theorem). For now partition handling is disabled by default, however we do intend to make it the default in an upcoming release: running with partition handling off is like running with scissors: do it at your own risk and only if you (don't) know what you're doing.

How we do it

A partition is assumed to happen when numOwners or more nodes disappear at the same time. When this happen two (or more) partitions form which are not aware of each other. Each such partition does not start a rebalance, but enters in degraded mode:
  • request (read and writes) for entries that have all the copies on nodes within this partition are honored
  • requests for entries that are partially or totally owned by nodes that disappeared are rejected through an AvailabilityException
To exemplify, consider the initial cluster C0={A,B,C,D}, A,B,C,D - nodes, configured in distributed mode with numOwners=2. Further on, the cluster contains k1, k2 and k3 keys such that owners(k1) = {A,B}, owners(k2) = {B,C} and owners(k3) = {C,D}. Then a partition happens C1={A,B} and C2={C,D}, the degraded mode exhibits the following behavior:
  • on C1, k1 is available for read/write, k2 (partially owned) and k3 (not owned) are not available and accessing them results in an AvailabilityException
  • on C2, k1 and k2  are not available for read/write, k3 is available
A relevant aspect of the partition handling process is the fact that when a split brain happens, the resulting partitions rely on the original consistent hash function (the one that existed before the split brain) in order to calculate key ownership. So it doesn't matter if k1, k2 or k3 already exists in the cluster or not, as the availability is strictly determined by the consistent hash and not by the key existence.
If at a further point in time the initial partition C0 forms again as a result of the network healing and C1 and C2 partitions being merged back together, then C0 exists the degraded mode becoming fully available again.

Configuration for partition handling functionality

In order to enable partition handling within the XML configuration:


The same can be achieved programmatically:



The actual implementation is work in progress and Beta2 will contain further improvements which we will publish here!

Cheers,
Mircea Markus

Wednesday 20 August 2014

Hot Rod Remote Events #2: Filtering events

This blog post is the second in a series that looks at the forthcoming Hot Rod Remote Events functionality included in Infinispan 7.0. In the first blog post we looked at how to get started receiving remote events from Hot Rod servers. This time we are going to focus on how to filter events directly in the server.

Sending events to remote clients has a cost which increases as the number of clients. The more clients register remote listeners, the more events the server has to send. This cost also goes up as the number of modifications are executed against the cache. The more cache modifications, the more events that need to be sent.

A way to reduce this cost is by filtering the events to send server-side. If at the server level custom code decides that clients are not interested in a particular event, the event does not even need to leave the server, improving the overall performance of the system.

Remote event filters are created by implementing a org.infinispan.notifications.cachelistener.filter.CacheEventFilterFactory class. Each factory must have a name associated to it via the org.infinispan.notifications.cachelistener.filter.NamedFactory annotation.

When a listener is added, we can provide the name of a key value filter factory to use with this listener, and when the listener is added, the server will look up the factory and invoke getFilter method to get a org.infinispan.notifications.cachelistener.filter.CacheEventFilterFactory class instance to filter events server side.

Filtering can be done based on key or value information, and even based on cached entry metadata. Here's a sample implementation which will filter key "2" out of the events sent to clients:

Plugging the server with this key value filter requires deploying this filter factory (and associated filter class) within a jar file including a service definition inside the META-INF/services/org.infinispan.notifications.cachelistener.filter.CacheEventFilterFactory file:

With the server plugged with the filter, the next step is adding a remote client listener that will use this filter. For this example, we'll extend the EventLogListener implementation provided in the first blog post in the series and we override the @ClientListener annotation to indicate the filter factory to use with this listener:

Next, we add the listener via the RemoteCache API and we execute some operations against the remote cache:



If we checkout the system output we'll see that the client receives events for all keys except those that have been filtered:

Finally, with Hot Rod remote events we have tried to provide additional flexibility at the client side, which is why when adding client listeners, users can provide parameters to the filter factory so that filter instances with different behaviours can be generated out of a single filter factory based on client side information. To show this in action, we are going to enhance the filter factory above so that instead of filtering on a statically given key, it can filter dynamically based on the key provided when adding the listener. Here's the revised version:

Finally, here's how we can now filter by "3" instead of "2":

And the output:


To summarise, we've seen how Hot Rod remote events can be filtered providing key/value filter factories that can create instances that filter which events are sent to clients, and how these filters can act on client provided information.

In the next blog post, we'll look at how to customize remote events in order to reduce the amount of information sent to the clients, or on the contrary, provide even more information to our clients.

Cheers,
Galder

Tuesday 12 August 2014

Hot Rod Remote Events #1: Getting started

Shortly after the first Hot Rod server implementation was released in 2010, ISPN-374 was created requesting cache events to be forwarded back to connected clients. Even though embedded caches have had access to these events since Infinispan's first release, propagating them to remote clients has taken a while, due to the increased complexity involved.

For Infinispan 7.0, we've finally addressed this. This is the first post in a series that looks at Hot Rod Remote Events and the different functionality we've implemented for this release. On this first post, we show you how to get started with Hot Rod Remote Events with the most basic of examples:

Start by downloading the Server distribution for the latest 7.0 (or higher) release from Infinispan's download page. The server contains the Hot Rod server with which the client will communicate. Once downloaded, start it up running the following from the root of the server:

./bin/standalone.sh

Next up, we need to write a little application that interacts with the Hot Rod server. If you're using Maven, create an application with this dependency, changing version to 7.0.0.Beta1 or higher:

If not using Maven, adjust according to your chosen build tool or download the all distribution with all Infinispan jars.

With the application dependencies in place, we need to start to write the client application. We'll start with a simple remote event listener that simply logs all events received:
Now it's time to write a simple main java class which adds the remote event listener and executes some operations against the remote cache:

Once executed, we should see a console console output similar to this:

As you can see from the output, by default events come with the key and the internal data version associated with the current value. The actual value is not shipped back to the client for performance reasons. Clearly, receiving remote events has a cost, and as the cache size increases and more operations are executed, more events will be generated. To avoid inundating Hot Rod clients, remote events can either be filtered server side, or the event contents can be customised. In the next blog post in this series, we will see this functionality in action.

Cheers,
Galder

Monday 11 August 2014

Infinispan 7.0.0.Beta1 is out!

Dear Community,

It is our pleasure to announce the first Beta release of Infinispan 7.0.0!

The release highlights are:

* Initial support for Cluster partitions (aka Split Brain) ISPN-263
* Added JGroups SASL support to Infinispan Server ISPN-4303
* Upgraded internal JGroups version to 3.5.0.CR2 ISPN-4609
* Map/Reduce tasks no longer have a timeout by default ISPN-4618
* Merge batching configuration with transaction modes to prevent ambiguity ISPN-4197
* Improve server test suite run times ISPN-4317
* Multiple improvements and bug fixes!

For a complete list of features and bug fixes included in this release please refer to the release notes.  Visit our downloads section to find the latest release.

If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,

The Infinispan team.

Wednesday 30 July 2014

Cross site replication with JBoss Data Grid



Are you afraid the Death Star or the Vogon Constructor Fleet are going to blow up your data on the Earth ? 
Navin Surtani has recorded a nice video demonstrating the use of JBoss Data Grid Cross-Site Replication and how it can be used to keep backups of your precious data off-site.

The code for the demo is also available. Enjoy.

Tuesday 29 July 2014

Infinispan Security #3: HotRod authentication

Let's continue our excursus into the security features that are being developed within Infinispan 7.0 by having a look at how our high-performance cache remoting protocol HotRod was enhanced to support authentication.

Since Infinispan 5.3, HotRod has featured SSL/TLS support which, aside from encryption, also provides some form of authentication by optionally requiring client certificates. While this does indeed stop unauthorized clients from connecting to a remote cache, the level of access-control ends there. Now that we have full role-based authorization checks at the cache and container level, we want to be able to recognize users and map their roles accordingly.

As usual, we didn't want to reinvent the wheel, but leverage existing security frameworks and integrate them into our existing platform. For this reason, the protocol chosen to implement HotRod authentication is SASL, which is in widespread use in other connection-oriented transports (e.g. LDAP, Memached, etc).

Using SASL we can support the following authentication mechanisms out of the box (since they are part of the standard JDK/JRE):
  • PLAIN where credentials are exchanged in clear-text (insecure, but easieast to setup)
  • DIGEST-MD5 where credentials are hashed using server-provided nonces
  • GSSAPI where clients can use Kerberos tokens
  • EXTERNAL where the client-certificate identity of the underlying transport is used as the credentials
More SASL mechanisms can be plugged in by using the Java Cryptography Archictecture (JCA).

Since our preferred server distribution is based on a stripped-down WildFly server, we are essentially reusing the Security Realms of the container. This gives us the ability to validate users and to also retrieve group membership. against a number of sources (property files, LDAP, etc).

The following is an example server configuration which uses the ApplicationRealm to authenticate and authorize users. Since the <identity-role-mapper> is in use, role names will be mapped 1:1 from the realm into Infinispan roles.
The HotRod endpoint is using the SASL PLAIN mechanism. Note that two caches have been defined: the default cache, without authorization, and a secured cache, which instead requires authorization. This means that remote clients can access the default cache anonymously, but they will need to authenticate if they want to access the secured cache.
The following bit of code explains how to use the HotRod Java client to connect to the secured cache defined above:
All of the above is already available in Infinispan 7.0.0.Alpha5, so head on over to the download page to experience the goodness.

Monday 28 July 2014

Upcoming Infinispan 7.0.0 Map/Reduce is blazing fast


Introduction


Our enthusiasm about Infinispan Map/Reduce implementation has been a driving impetus for new features and spectacular performance improvements we have achieved in the past months. As we approach the final Infinispan 7 release, we can not keep quiet about these improvements any longer. We wanted to share the most significant new Map/Reduce features as well as a rather important performance improvement along with the details on how we achieved it.

New features


In the new features category, the most notable is a scalability improvement that allows storage of MapReduceTask's results in a distributed cache instead of returning results to the calling application. Infinispan now gives users the option to specify a target cache to store the results of an executed MapReduceTask. The results are available after the execute method (which is synchronous) completes. This new variant of the execute method prevents the master JVM node from exceeding its allowed maximum heap size.  Users could, for example, utilize the new execute method if objects that are the results of the reduce phase have a large memory footprint or if multiple MapReduceTasks are concurrently executing on the master task node. We have provided two variants of the new execute method:

We also enhanced parallel execution of map/reduce functions at each node and improved handling of intermediate results. Users can now specify custom intermediate cache for a particular MapReduceTask.

Performance improvements


Infinispan 7 makes a rather big leap from a single threaded to a parallel execution model of both map and reduce phases on each Infinispan grid node. The final result of this change is on average fivefold faster execution of your typical MapReduceTask.

Even though map and reduce phases are sequential we can still execute the map and reduce phases themselves in parallel. If you recall, although Infinispan 6 executes map and reduce phases on all nodes in parallel, execution on each node itself is single-threaded. Similarly, reduce phase although executed on multiple nodes in parallel, each node executes its portion of the reduction on a single thread.

Since we baselined Infinispan 7 on JDK 7, we decided to experiment with fork/join threading framework for parallel execution of both map and reduce phases [2]. If you recall fork/join framework enables high performance, parallel, fine-grained task execution in Java. Although parallel, recursively decomposable tasks are well suited for fork/join framework it may come as a surprise that parallel iteration of entries in arrays, maps and other collections represents a good fit as well. And do we have a well-suited candidate for parallel fork/join iteration - cache's data container itself! In fact, most of the work is related to iterating entries from the data container and invoking map/combine and reduce function on those entries.

Map/combine phase is particularly interesting. Even if we use the fork/join framework map and combine phases are distinct and until now - serially executed. Having serial execution of map and combine is not the only downside as map phase can be rather memory intensive. After all, it has to store all intermediate results into provided collectors. Combine phase takes produced intermediate values for a particular key and combines it into a single intermediate value. Therefore, it would very useful to periodically invoke combine on map produced keys/values thus limiting the total amount of memory used for map phase. So the question is how do we execute map/combine in parallel efficiently thus speeding up execution and at the same time limiting the memory used? We found the answer in producer/consumer threading paradigm.

In our case producers are fork/join threads that during map phase iterate key/value data container and invoke Mapper's map function. Map function transformation produces intermediate results stored into the Infinispan provided queue of collectors. Consumers are also fork/join threads that invoke combine function on key/value entries in those collectors. Note that this way map/combine phase execution itself becomes parallel, and phases of mapping and combing are no longer serial. In the end, we have notably lowered memory usage and significantly improved overall speed execution of map/combine algorithm at the same time.

Performance lab results


Although initial performance results were more than promising, we were not satisfied. The throughput peaked for 32KB cache values but was much lower for larger and smaller values in our tests. We went back to the drawing board and devised the above-described map/combine algorithm using fork/join framework and producer/consumer approach. This time the results from the performance lab were excellent. For more details on performance tests and hardware used please refer to [1].

As you can see from the graphs below we have improved performance for all cache value sizes. We were not able to significantly improve throughput for the largest 1MB and 2MB cache values. For all other cache value sizes, we have seen on average five-fold throughput improvement. As throughput improvement is directly proportional to MapReduceTask speed of execution improvement, our users should expect their MapReduceTasks to execute, on average, five times faster in Infinispan 7 than in Infinispan 6.






[1] http://blog.infinispan.org/2014/06/mapreduce-performance-improvements.html
[2] We back ported relevant classes so users can still run Infinispan 7 on JVM 6

Thursday 24 July 2014

Infinispan Security #2: Authorization Reloaded and Auditing

Since the last post on Infinispan Security, there have been a few changes related to how we handle authorization in Infinispan. All of these are available in the recently released 7.0.0.Alpha5.

Authorization Performance

Previously we mentioned the need to use a SecurityManager and javax.security.auth.Subject.doAs() in order to achieve authorization in embedded mode. Unfortunately those components have a severe performance impact. In particular, retrieving the current Subject from the AccessControlContext adds, on my test environment, 3.5µs per call. Since the execution of a Cache.put takes about 0.5µs when authorization is not in use, this means that every invocation is 8x slower.

For this reason we have introduced an alternative for those of you (hopefully the majority), that do not need to use a SecurityManager and can avoid the AccessControlContext: invoke your PrivilegedActions using the new org.infinispan.security.Security.doAs() method, which uses a ThreadLocal to store the current Subject. Using this new approach, the overhead per invocation falls to 0.5µs: a 7x improvement !
Security.doAs() is actually smart: if it detects that a SecurityManager is in use, it will fallback to retrieving the Subject via the AccessControlContext, so you don't need to decide what approach you will be using in production. The following chart shows the impact of using each approach when compared to running without authorization:


Auditing


An essential part of an authorization framework, is the ability to track WHO is doing WHAT (or is being prevented from doing it !). Infinispan now offers a pluggable audit framework which you can use to track the execution of cache/container operations. The default audit logger sends audit messages to your logging framework of choice (e.g. Log4J), but you can write your own to take any appropriate actions you deem appropriate for your security policies.

As usual, for more details, check out the Security chapter in the Infinispan documentation and the org.infinispan.security JavaDocs.

Thursday 17 July 2014

Infinispan 7.0.0.Alpha5 relased!

Dear Community,

It is our pleasure to announce the Alpha5 release of Infinispan 7.0.0. This release includes numerous fixes and enhancements in every area of Infinispan. This will be our last alpha release as we move into beta stage and prepare for the final Infinispan 7.0 release.

Major theme of this release is Hot Rod eventing. Java Hot Rod clients can now subscribe to cache entry created, modified and removed events. For more details see  documentation. Infinispan Query is now based on Hibernate Search 5; we also use protoparser lib instead of relying on binary descriptors generated by protoc. Our JPA cache store works in Karaf; we have also made further Map/Reduce improvements and bug fixes. Starting with this release Infinispan server is based on WildFly 8.1 while Hot Rod SASL implementation supports QOP for encryption.

For a complete list of features and bug fixes included in this release please refer to the release notes. Visit our downloads section to find the latest release.

If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,
The Infinispan team.

Thursday 5 June 2014

Map/Reduce Performance improvements between Infinispan 6 and 7


Introduction


There have been a number of recent Infinispan 7.0 Map/Reduce performance related improvements that we were eager to test in our performance lab and subsequently share with you. The results are more than promising. In the word count use case, Map/Reduce task execution speed and throughput improvement is between fourfold and sixfold in certain situations that were tested.

We have achieved these improvements by focusing on:
  • Optimized mapper/reducer parallel execution on all nodes
  • Improving the handling and processing of larger data sets
  • Reducing the amount of memory needed for execution of MapReduceTask

Performance Test Results


The performance tests were run using the following parameters:
  • An Infinispan 7.0.0-SNAPSHOT build created after the last commits from the list were committed to the Infinispan GIT repo on May 9th vs Infinispan 6.0.1.Final 
  • OpenJDK version 1.7.0_55 with 4GB of heap and the following JVM options:
    -Xmx4096M -Xms4096M -XX:+UseLargePages -XX:MaxPermSize=512m -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
  • Random data filled 30% of the Java heap, and 100 random words were used to create the 8 kilobyte cache values. The cache keys were generated using key affinity, so that the generated data would be distributed evenly in the cache. These values were chosen, so that a comparison to Infinispan 6 could be made. Infinispan 7 can handle a final result map with a much larger set of keys than is possible in Infinispan 6. The actual amount of heap size that is used for data will be larger due to backup copies, since the cluster is running in distributed mode.
  • The MapReduceTask executes a word count against the cache values using mapper, reducer, combiner, and collator implementations. The collator returns the 10 most frequently occurring words in the cache data. The task used a distributed reduce phase and a shared intermediate cache. The MapReduceTask is executed 10 times against the data in the cache and the values are reported as an average of these durations.

    From 1 to 8 nodes using a fixed amount of data and 30% of the heap


    This test executes two word count executions on each cluster with an increasing number of nodes. The first execution uses an increasing amount of data equal to 30% of the total Java heap across the cluster (i.e. With one node, the data consumes 30% of 4 GB. With two nodes, the data consumes 30% of 8 GB, etc.), and the second execution uses a fixed amount of data, (1352 MB which is approximately 30% of 4 GB). Throughput is calculated by dividing the total amount of data processed by the Map/Reduce task by the duration. The following charts show the throughput as nodes are added to the cluster for these two scenarios:

    These charts clearly show the increase in throughput that were made in Infinispan 7. The throughput also seems to scale in an almost linear fashion for this word count scenario. With one node, Infinispan 7 processes the 30% of heap data in about 100 MB/sec, two nodes process almost 200 MB/sec, and 8 nodes process over 700 MB/sec.

    From 1 to 8 nodes using different heap size percentages


    This test executes the word count task using different percentages of heap size as nodes are added to the cluster. (5%, 10%, 15%, 20%, 25%, and 30%) Here are the throughput results for this test:

    Once again, these charts show an increase in throughput when performing the same word count task using Infinispan 7. The chart for Infinispan 7 shows more fluctuation in the throughput across the different percentages of heap size. The throughput plotted in the Infinispan 6 chart is more consistent.

    From 1 to 8 nodes using different value sizes


    This test executes the word count task using 30% of the heap size and different cache value sizes as nodes are added to the cluster. (1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, and 2MB) Here are the throughput results for this test:

    These results are more interesting. The throughput in Infinispan 7 is higher for certain cache size values, but closer to Infinispan 6 or even slower for other cache size values. The throughput peaks for 32KB cache values, but can be much lower for larger and smaller values. Smaller values require more overhead, but for larger values this behavior is not expected. This result needs to be investigated more closely.

    Conclusion


    The performance tests show that Infinispan 7 Map/Reduce improvements have increased the throughput and execution speed four to sixfold in some use cases. The changes have also allowed Infinispan 7 to process data sets that include larger intermediate results and produce larger final result maps. There are still areas of the Map/Reduce algorithm that need to be improved:
    • The Map/Reduce algorithm should be self-tuning. The maxCollectorSize parameter controls the number of values that the collector holds in memory, and it is not trivial to determine the optimal value for a given scenario. The value is based on the size of the values in the cache and the size of the intermediate results. A user is likely to know the size of the cache values, but currently Infinispan does not report statistics about the intermediate results to the user. The Map/Reduce algorithm should analyze the environment at runtime and adjust the size of the collector dynamically.
    • The fact that the throughput results vary with different value sizes needs to be investigated more closely. This could be due to the fact that the maxCollectorSize value used for these tests is not ideal for all value sizes, but there might be other causes for this behaviour.