Thursday 30 July 2009

More JUG talks

Adrian Cole and I recently presented at a JUGs in Krakow and Dublin, on Infinispan and JClouds.

Krakow was great - and hot: 39 degree weather! (Don't we just love global warming?) Anna Kolodziejczyk at Proidea (who also organises JDD) organised the event, which attracted an excited and interactive audience of about 40 people. Dublin, beautiful as always, was a polar opposite in terms of weather: a cloudy, windy and wet day, living up to its reputation. Luan O'Carroll of DUBJUG organised the event, at the plush Odessa Club. Again, an inquisitive audience of about 35 people attended, with a lot of questions on the future of data storage in clouds. Thomas Diesler of JBoss OSGi fame made a surprise guest appearance too.

In general, the talks have been very well received and have provoked thought and discussion. As requested by many, I will soon be recording a podcast of this talk and will make it available on this blog.

Apart from a JUG I hope to organise soon in London, the next time I speak about Infinispan publicly will be at JBoss World in September. Hope to see you there!

Cheers
Manik

Monday 27 July 2009

Increase transactional throughput with deadlock detection

Deadlock detection is a new feature in Infinispan. It is about increasing the number of transactions that can be concurrently processed. Let's start with the problem first (the deadlock) then discuss some design details and performance.

So, the by-the-book deadlock example is the following:
  • Transaction one (T1) performs following operation sequence: (write key_1,write key_2)
  • Transaction two (T2) performs following sequence: (write key_2, write key_1).
Now, if the T1 and T2 happen at the same time and both have executed first operation, then they will wait for each other virtually forever to release owned locks on keys. In the real world, the waiting period is defined by a lock acquisition timeout (LAT) - which defaults to 10 seconds - that allows the system to overcome such scenarios and respond to the user one way (successful) or the other(failure): so after a period of LAT one (or both) transaction will rollback, allowing the other to continue working.

Deadlocks are bad for both system's throughput and user experience. System throughput is affected because during the deadlock period (which might extend up to LAT) no other thread will be able to update neither key_1 nor key_2. Even worse, access to any other keys that were modified by T1 or T2 will be similarly restricted. User experience is altered by the fact that the call(s) will freeze for the entire deadlock period, and also there's a chance that both T1 and T2 will rollback by timing out.

As a side note, in the previous example, if the code running the transactions would(and can) enforce any sort of ordering on the keys accessed within the transaction, then the deadlock would be avoided. E.g. if the application code would order the operation based on the lexicographic ordering of keys, both T1 and T2 would execute the following sequence: (write key_1,write key_2), and so no deadlock would result. This is a best practice and should be followed whenever possible.
Enough with the theory! The way Infinispan performs deadlock detection is based on an algorithm designed by Jason Greene and Manik Surtani, which is detailed here. The basic idea is to split the LAT in smaller cycles, as it follows:

lock(int lockAcquisitionTimeout) {
while (currentTime < startTime + timeout) {
if (acquire(smallTimeout)) break;
testForDeadlock(globalTransaction, key);
}
}

What testForDeadlock(globalTransaction, key) does is check weather there is another transaction that satisfies both conditions:
  1. holds a lock on key and
  2. intends to lock on a key that is currently called by this transaction.
If such a transaction is found then this is a deadlock, and one of the running transactions will be interrupted: the decision of which transaction will interrupt is based on coin toss, a random number that is associated with each transaction. This will ensure that only one transaction will rollback, and the decision is deterministic: nodes and transactions do not need to communicate with each other to determine the outcome.

Deadlock detection in Infinispan works in two flavors: determining deadlocks on transactions that spread over several caches and deadlock detection in transactions running on a single(local) cache.

Let's see some performance figures as well. A class for benchmarking performance of deadlock detection functionality was created and can be seen here. Test description (from javadoc):

We use a fixed size pool of keys (KEY_POOL_SIZE) on which each transaction operates. A number of threads (THREAD_COUNT) repeatedly starts transactions and tries to acquire locks on a random subset of this pool, by executing put operations on each key. If all locks were successfully acquired then the tx tries to commit: only if it succeeds this tx is counted as successful. The number of elements in this subset is the transaction size (TX_SIZE). The greater transaction size is, the higher chance for deadlock situation to occur. On each thread these transactions are being repeatedly executed (each time on a different, random key set) for a given time interval (BENCHMARK_DURATION). At the end, the number of successful transactions from each thread is cumulated, and this defines throughput (successful tx) per time unit (by default one minute).

Disclaimer: The following figures are for a scenario especially designed to force very high contention. This is not typical, and you shouldn't expect to see this level of increase in performance for applications with lower contention (which most likely is the case). Please feel free tune the above benchmark class to fit the contention level of your application; sharing your experience would be very useful!

Following diagram shows the performance degradation resulting from running the deadlock detection code by itslef in a scenario where no contention/deadlocks are present.
Some clues on when to enable deadlock detection. A high number of transaction rolling back due to org.infinispan.util.concurrent.TimeoutException is an indicator that this functionality might help. TimeoutException might be caused by other causes as well, but deadlocks will always result in this exception being thrown. Generally, when you have a high contention on a set of keys, deadlock detection may help. But the best way is not to guess the performance improvement but to benchmark and monitor it: you can have access to statistics (e.g. number of deadlocks detected) through JMX, as it is exposed via the DeadlockDetectingLockManager MBean.

Monday 20 July 2009

Berlin and Stuttgart say hello to Infinispan


Last week I finally put together my presentation on cloud computing and Infinispan. To kick things off, I presented it at two JUG events in Germany.

Berlin's Brandenburg JUG organised an event at the NewThinking Store in Berlin's trendy Mitte district. Thanks to Tobias Hartwig and Ralph Bergmann for organising the event, which drew an audience of about 35 people. Cloud computing was the focus of the evening, and I started the event with my rather lengthy presentation on cloud computing and specific issues around persisting data in a cloud. The bulk of the presentation focused on Infinispan, what it provides as a data grid platform, and what's on the roadmap. After a demo and a short break, Infinispan committer Adrian Cole then spoke about JClouds, demonstrating Infinispan's use of JClouds to back cached state onto Amazon's S3. You can read more about Adrian's presentation on his blog.

Two days later, the Stuttgart JUG arranged for me to speak to their JBoss Special Interest Group on Infinispan. Thanks to Tobias Frech and Heiko Rupp for organising this event, which was held in one of Red Hat's training rooms in Stuttgart. The presentation followed a similar pattern to what was presented in Berlin, to an audience of about 15 people.

In both cases, there was an overwhelming interest in Infinispan as a distributed storage engine. The JPA interface which is on our roadmap generated a lot of interest, as did the query API and to a lesser extent the asynchronous API - which could benefit from a better example in my presentation to demonstrate why this really is a powerful thing.

Overall, it is good to see that folks are interested in and are aware of the challenges involved in data storage on clouds, where traditional database usage is less relevant.

Many people have asked me for downloadable versions of my slides. Rest assured I will put them up - either as PDFs or better still, as a podcast - over the next 2 weeks.

Coming up, I will be in Krakow speaking at their JUG on Thursday the 23rd, and then in Dublin on Tuesday the 29th. Details of these two events are on the Infinispan Talks Calendar. Hope to see you there!

Cheers
Manik

Friday 17 July 2009

First experiences presenting Infinispan

Last week was one of the most exciting weeks for me since joining the Infinispan team because for the very first time, I was going to present Infinispan to the world :)

Firstly last Tuesday, I introduced Infinispan to Switzerland's Java User Group, where a crowd of around 20 people learned about the usability improvements introduced, the performance and memory consumption enhancements, and forthcoming new features. To finish the presentation, I showed the audience a demo of 3 distributed Infinispan instances connected to an Amazon S3 cache store via JClouds. I received some very positive feedback from the attendees who, in particular, were interested in finding out the differences between grid and cloud computing.

Two days later I went to Brussels to do the same presentation for Belgium's JBoss User Group and the reaction was even better there! A lot of Spring developers attended the presentation who were very keen on integrating Infinispan in their own projects.

From here I'd like to thank all the people who attended these two sessions and in particular the organizers, Jakob Magun and Phillip Oser from Switzerland's Java User Group and Joris De Winne from Belgium's JBoss User Group.

Cheers,
Galder

Monday 13 July 2009

4.0.0.ALPHA6 - another alpha for Infinispan.

Yes, we've felt the need for one more Alpha. This alpha contains a number of bug fixes over Alpha5, as well as some new minor features. Please have a look at the release notes for details.

In addition to code changes, Vladimir Blagojevic has contributed a Doclet to generate a configuration reference. Check this out here. While not all config elements are properly annotated in this release - and as such the configuration reference is somewhat sparse - thanks to this tool, a more complete and up-to-date configuration reference is something you can look forward to in future releases.

Further, Alejandro Montenegro has started compiling steps for an interactive tutorial. Making use of a Groovy shell, this tutorial guides readers through most of Infinispan's APIs in an interactive manner that would hopefully make it easy to learn about Infispan. Please do give this a try and provide Alejandro with feedback!

Please download and try out this release, and feed back with your thoughts and experiences!

Cheers
Manik

Friday 10 July 2009

Infinispan@JBossWorld

I will be presenting on Infinispan, data grids and the data fabric of clouds at JBoss World Chicago, in September 2009. I will cover a brief history of Infinispan and the motivations behind the project, and then talk in a more abstract manner about data grids and the special place they occupy in providing data services for clouds.

In addition, I expect to pontificate on my thoughts on clouds and the future of computing in general to anyone who buys me a coffee/beer! :-)

So go on, convince your boss to let you go, and attend my talk, and hopefully I'll see you there!

Cheers
Manik

Monday 6 July 2009

Upcoming JUG and JBUG talks on Infinispan


I will be speaking at a number of JUGs around Europe this month. In addition, other core Infinispan devs will also be making JUG and conference appearances. I've put together a calendar of events which you can track, or add to your Google calendar to monitor.

http://www.jboss.org/infinispan/talks

This is a great chance for folks to learn about Infinispan, cloud and distributed computing, and where the project is headed. Hope to see you at one or more of these events!

Cheers
Manik