Tuesday 23 February 2010

Infinispan 4.0.0.Final has landed!

It is with great pleasure that I'd like to announce the availability of the final release of Infinispan 4.0.0. Infinispan is an open source, Java-based data grid platform that I first announced last April, and since then the codebase has been through a series of alpha and beta releases, and most recently 4 release candidates which generated a lot of community feedback.

It has been a long and wild ride, and the very active community has been critical to this release. A big thank you to everyone involved, you all know who you are.

Benchmarks
I recently published an article about running Infinispan in local mode - as a standalone cache - compared to JBoss Cache and EHCache. The article took readers through the ease of configuration and the simple API, and then demonstrated some performance benchmarks using the recently-announced Cache Benchmarking Framework. We've been making further use of this benchmarking framework in the recent weeks and months, extensively testing Infinispan on a large cluster.

Here are some simple charts, generated using the framework. The first set compare Infinispan against the latest and greatest JBoss Cache release (3.2.2.GA at this time), using both synchronous and asynchronous replication. But first, a little bit about the nodes in our test lab, comprising of a large number of nodes, each with the following configuration:
  • 2 x Intel Xeon E5530 2.40 GHz quad core, hyperthreaded processors (= 16 hardware threads per node)
  • 12GB memory per node, although the JVM heaps are limited at 2GB
  • RHEL 5.4 with Sun 64-bit JDK 1.6.0_18
  • InfiniBand connectivity between nodes
And a little bit about the way the benchmark framework was configured:
  • Run from 2 to 12 nodes in increments of 2
  • 25 worker threads per node
  • Writing 1kb of state (randomly generated Strings) each time, with a 20% write percentage

ReadsWrites
Synchronous
Replication
Asynchronous
Replication
As you can see, Infinispan significantly outperforms JBoss Cache, even in replicated mode. The large gain in read performance, as well as asynchronous write performance, demonstrates the minimally locking data container and new marshalling techniques in Infinispan. But you also notice that with synchronous writes, performance starts to degrade as the cluster size increases. This is a characteristic of replicated caches, where you always have fast reads and all state available on each and every node, at the expense of ultimate scalability.

Enter Infinispan's distributed mode. The goal of data distribution is to maintain enough copies of state in the cluster so it can be durable and fault tolerant, but not too many copies to prevent Infinispan from being scalable, with linear scalability being the ultimate prize. In the following runs, we benchmark Infinispan's synchronous, distributed mode, comparing 2 different Infinispan configurations. The framework was configured with:
  • Run from 4 to 48 nodes, in increments of 4 (to better demonstrate linear scalability)
  • 25 worker threads per node
  • Writing 1kb of state (randomly generated Strings) each time, with a 20% write percentage

ReadsWrites
Synchronous
Distribution


















As you can see, Infinispan scales linearly as the node count increases. The different configurations tested, lazy stands for enabling lazy unmarshalling, which allows for state to be stored in Infinispan as byte arrays rather than deserialized objects. This has certain advantages for certain access patterns, for example where remote lookups are very common and local lookups are rare.


How does Infinispan comparing against ${POPULAR_PROPRIETARY_DATAGRID_PRODUCT}?
Due to licensing restrictions on publishing benchmarks of such products, we are unfortunately not at liberty to make such comparisons public - although we are very pleased with how Infinispan compares against popular commercial offerings, and plan to push the performance envelope even further in 4.1.

And just because we cannot publish such results, that does not mean that you cannot run such comparisons yourself. The Cache Benchmark Framework has support for different data grid products, including Oracle Coherence, and more can be added easily.

Aren't statistics just lies?
We strongly recommend you running the benchmarks yourself. Not only does this prove things for yourself, but also allows you to benchmark behaviour on your specific hardware infrastructure, using the specific configurations you'd use in real-life, and with your specific access patterns.

So where do I get it?
Infinispan is available on the Infinispan downloads page. Please use the user forums to communicate with us about the release. A full change log of features in this release is on JIRA, and documentation is on our newly re-organised wiki. We have put together several articles, chapters and examples; feel free to suggest new sections for this user guide - topics you may find interesting or bits you feel we've left out or not addressed as fully.

What's next?
We're busy hacking away on Infinispan 4.1 features. Expect an announcement soon on this, including an early alpha release for folks to try out. If you're looking for Infinispan's roadmap for the future, look here.

Cheers, and enjoy!
Manik

Tuesday 16 February 2010

Benchmarking Infinispan and other Data Grid software

Why benchmarking?
Benchmarking is an important aspect for us: we want to monitor our performance improvements between releases and compare ourselves with other products as well. Benchmarking a data grid product such as Infinispan is not a trivial task: one needs to start multiple processes over multiple machines, coordinate between them to make sure everything runs at once and centralize reports. Then there is the question of what access patterns the benchmark should stress.

Introducing the cache benchmarking framework (CBF)
What we've come up with is a tool to help us run our benchmarks and generate reports and charts. And more:
- simple to configure (see config sample bellow)
- simple to run. We supply a set of .sh scripts that connect to remote nodes and start cluster instances for you.
- open source. Everybody can download it, read the code and run the benchmarks by themselves. Published results can be easily verified and validated.
- extensible. It's easy to extend the framework in order to benchmark against additional products. It's also easy to write different data access patterns to be tested.
- scalable. At this moment we've used CBF for benchmarking up to 62 nodes.
- users can test products, configurations, and access patterns on their own hardware and network. This is crucial, since it means educated decisions can be made based on relevant and use-case specific statistics and measurements. Further, the benchmark can even be used to compare performance of different configurations and tuning parameters of a single data grid product, to help users choose a configuration that works best for them

Below is a sample configuration file and generated report.

<bench-config>

<master bindAddress="${127.0.0.1:master.address}" port="${2103:master.port}"/>

<benchmark initSize="2" maxSize="${4:slaves}" increment="1">
<DestroyWrapper runOnAllSlaves="true"/>
<StartCluster/>
<ClusterValidation partialReplication="false"/>
<Warmup operationCount="1000"/>
<WebSessionBenchmark numberOfRequests="2500" numOfThreads="2"/>
<CsvReportGeneration/>
</benchmark>

<products>
<jbosscache3>
<config name="mvcc/mvcc-repl-sync.xml"/>
</jbosscache3>
<infinispan4>
<config name="repl-sync.xml"/>
<config name="dist-sync.xml"/>
<config name="dist-sync-l1.xml"/>
</infinispan4>
</products>

<reports>
<report name="Replicated">
<item product="infinispan4" config="repl-sync.xml"/>
<item product="jbosscache3" config="mvcc/mvcc-repl-sync.xml"/>
</report>
<report name="Distributed">
<item product="infinispan4" config="dist-*"/>
</report>
<report name="All" includeAll="true"/>
</reports>

</bench-config>


And this is what a generated charts look like:



Where can you find CBF?
CBF can be found here. For a quick way of getting up to speed with it we recommend the 5 minutes tutorial.

Enjoy!

Mircea



Friday 12 February 2010

Infinispan/Jopr flash movies released

Back in December we announced the release of a screen cast showing how to monitor Infinispan with Jopr. Today we've just added 3 detailed flash movies on how to install Jopr and Infinispan Jopr plugin, and also how to monitor Infinispan instances that have been discovered automatically or have been added manually. You can find these flash movies in the Infinispan wiki.

Cheers,
Galder

Poll: How do you interact with Infinispan?

While discussing the different ways to interact with Infinispan, we decided to open up a poll so that people tell us how they expect to be using Infinispan. Do you use Infinispan directly on the same VM? Or do you use REST? Are you planning to interact via memcached or Hot Rod interface?

The poll can be found here. Please make sure that if you vote, you add a comment indicating the reasons why you chosen that option.

Cheers,
Galder

Thursday 4 February 2010

Infinispan and storage in the cloud

I will be presenting on Infinispan and its role in cloud storage, at Red Hat's Cloud Computing Forum on the 10th of February 2010.

This is a virtual event, where you get to attend from the comfort of your desk. And although it is free, you do need to register beforehand so I recommend your doing so.

Cheers
Manik

Tuesday 2 February 2010

Infinispan 4.0.0.CR4

In the run-up to preparing Infinispan for a public release, we've been busy on a number of interesting things, which have led to a decision to release another CR instead.

The main driver behind this is that we've finally managed to get our hands on a sizeable cluster large enough to truly test scalability. Expect interesting public benchmarks to be published soon, watch this space. (I recently blogged about some local-mode benchmarks)

To enable such benchmarks, we've renewed efforts on building out the Cache Benchmarking Framework. This framework was originally a part of JBoss Cache's source tree, and has now been extracted and migrated to SourceForge. We welcome others contributing additional plugins for more distributed cache/data grid products, as well as more tests and access patterns.

Finally, extensive community feedback over the past few weeks have resulted in lots of bugs fixed and performance patches applied. Also, we finally have a beta release of JClouds and an all-new CloudCacheStore for folks to play with.

The release is available in its usual place. I look forward to getting feedback on this release, this time truly a release candidate, i.e., one that, unchanged, could very well become the final release.

Your last chance for feedback on this release, people!

Cheers
Manik

Infinispan as a LOCAL cache

While Infinispan has got the distributed, in-memory data grid market firmly it in its sight, there is also another aspect of Infinispan which I feel people would find interesting.

At its heart Infinispan is a highly concurrent, extremely performant data structure than can be distributed, or could be used in a standalone, local mode - as a cache. But why would people use Infinispan over, say, a ConcurrentHashMap? Here are some reasons.

Feature-rich
  • Eviction. Built-in eviction ensures you don't run out of memory.
  • Write-through and write-behind caching. Going beyond memory and onto disk (or any other pluggable CacheStore) means that your state survives restarts, and preloaded hot caches can be configured.
  • JTA support and XA compliance. Participate in ongoing transactions with any JTA-compliant transaction manager.
  • MVCC-based concurrency. Highly optimized for fast, non-blocking readers.
  • Manageability. Simple JMX or rich GUI management console via JOPR, you have a choice.
  • Not just for the JVM. RESTful API, and upcoming client/server modules speaking Memcached and HotRod protocols help non-JVM platforms use Infinispan.
  • Cluster-ready. Should the need arise.
Easy to configure, easy to use
The simplest configuration file containing just
<infinispan />
is enough to get you started, with sensible defaults abound. (More detailed documentation is also available).

All the features above are exposed via an easy-to-use Cache interface, which extends ConcurrentMap and is compatible with many other cache systems. Infinispan even ships with migration tools to help you move off other cache solutions onto Infinispan, whether you need a cache to store data retrieved remotely or simply as a 2nd level cache for Hibernate.

Performance
In the process of testing and tuning Infinispan on very large clusters, we have started to put together a benchmarking framework. As a part of this framework, we have the ability to measure cache performance in standalone, local mode. So in the context of this blog post, I'd
like to share some recent performance numbers of Infinispan - a recent snapshot - compared against the latest JBoss Cache release (3.2.2.GA) and EHCache (1.7.2). Some background on the tests:
  • Used a latest snapshot of the CacheBenchFwk
  • Run on a RHEL 5 server with 4 Intel Xeon cores, 4GB of RAM
  • Sun JDK 1.6.0_18, with -Xms1g -Xmx1g
  • Test run on a single node, with 25 concurrent threads, using randomly generated Strings as keys and values and a 1kb payload for each entry, with a 80/20 read/write ratio.
  • Performance measured in transactions per second (higher = better).

In summary, what we have here is that when run in local mode, Infinispan is a high-performance standalone caching engine which offers a rich set of features while still being trivially simple to configure and use.

Enjoy,
Manik