Tuesday, 31 July 2012

C# client for Infinispan - alpha release

A while ago I was announcing that Sunimal Rathnayake would start the work for a C# Hot Rod  client for Infinispan as part of the Google Summer of Code. After 2 months of heavy work Sunimal delivered an intelligence-one (basic client, interested in neither cluster nor hash information) implementation.
The release distribution can be downloaded from here. Besides the required binaries and doclets, is also contains a sample application showing how the client can be configured and illustrating the basic operations with the server. This and more are being described in the readme.txt file in the distribution root.

And there's much more on the way! Sunimal is working on enhancing the client to the intelligence-two level: topology-aware client, interested in cluster information - stay tuned!

Please give it a try and don't hesitate to post your comments to our forums, the mailing list  or contact us directly on IRC for a chat!

Cheers,
Mircea



Wednesday, 25 July 2012

Map/Reduce improvements in Infinispan 5.2.0ALPHA2

As our MapReduce implementation grew out of the proof of concept phase (and especially after our users had already production tested it), we needed to remove the most prominent impediment to an industrial grade MapReduce solution that we strive for: distributing reduce phase execution.

Reduce phase prior to the Infinispan 5.2 release was done on a single Infinispan master task node. Therefore, the size of map reduce problems we could support (data size wise) was effectively shrunk to a working memory of a single Infinispan node. Starting with the Infinispan 5.2 release, we have removed this limitation, and reduce phase execution is distributed across the cluster as well. Of course, users still have an option to use MapReduceTask the old way, and we even recommend that particular approach for smaller sized input tasks. We have achieved distribution of reduce phase by relying on Infinispan's consistent hashing and DeltaAware cache insertion. Here is how we distributed reduce phase execution:



Map phase


MapReduceTask, as it currently does, will hash task input keys and group them by execution node N they are hashed to*. After key node mapping, MapReduceTask sends map function and input keys to each node N. Map function is invoked using given keys and locally loaded corresponding values.


Map and Combine phase




Results are collected with an Infinispan supplied Collector, and combine phase is initiated. A Combiner, if specified, takes KOut keys and immediately invokes reduce phase on keys. The result of mapping phase executed on each node is KOut/VOut map. There will be one resulting map per execution node N per launched MapReduceTask.



Intermediate KOut/VOut migration phase


In order to proceed with reduce phase, all intermediate keys and values need to be grouped by intermediate KOut keys. More specifically, as map phases around the cluster can produce identical intermediate keys, all those identical intermediate keys and their values need to be grouped before reduce is executed on any particular intermediate key.



Therefore at the end of combine phase, instead of returning map with intermediate keys and values to the master task node, we instead hash each intermediate key KOut and migrate it with its VOut values to Infinispan node where keys KOut are hashed to. We achieve this using a temporary DIST cache and underlying consistent hashing mechanism. Using DeltaAware cache insertion we effectively collect all VOut values under each KOut for all executed map functions across the cluster.
Intermediate KOut/VOut grouping phase


At this point, map and combine phase have finished its execution; list of KOut keys is returned to a master node and its initiating MapReduceTask. We do not return VOut values as we do not need them at master task node. MapReduceTask is ready to start with reduce phase.


Reduce phase


Reduce phase is easy to accomplish now as Infinispan's consistent hashing already finished all the hard lifting for us. To complete reduce phase, MapReduceTask groups KOut keys by execution node N they are hashed to. For each node N and its grouped input KOut keys, MapReduceTask sends a reduce command to a node N where KOut keys are hashed. Once reduce command arrives on target execution node, it looks up temporary cache belonging to MapReduce task - and for each KOut key, grabs a list of VOut values, wraps it with an Iterator and invokes reduce on it.


Reduce phase


A result of each reduce is a map where each key is KOut and value is VOut. Each Infinispan execution node N returns one map with KOut/VOut result values. As all initiated reduce commands return to a calling node, MapReduceTask simply combines all resulting maps into map M and returns M as a result of MapReduceTask.


Distributed reduce phase is turned on by using a MapReduceTask constructor specifying cache to use as input data for the task and boolean parameter distributeReducePhase set to true. Map/Reduce API javadoc and demos are included in distribution.


Moving forward


For Infinispan 5.2.0 final release we want to make sure the execution of intermediate migration key/value phase is as effective as possible and proven to be lock free for large input tasks as it was in our functional tests. We are also, as always, looking forward to your feedback and suggestions - especially if you have large data input sets ready for our latest MapReduceTask.


Cheers,
Vladimir
  


*If no keys are specified, entire cache key set will be used as in input.

Monday, 23 July 2012

Infinispan 5.2.0.ALPHA2 is here!

Infinispan 5.2.0.ALPHA2 was released last Friday with several additions for those that like to test Infinispan's bleeding edge capabilities. In this case, it's out Map/Reduce functionality that's the star of the show:
Vladimir Blagojevic, one of our Infinispan developers, will be explaining all about these features in a blog post coming right up, so stay tuned! :)

Finally Adrian Nistor, the latest addition to the Infinispan team, has been working on reducing the size of our distribution files by avoiding duplication of jars.

Cheers,
Galder

Wednesday, 11 July 2012

Infinispan's distributed executors and Map/Reduce in spotlight at JUDCon and JBW


JUDCon and JBoss World 2012 finished just a bit over a week ago in Boston and were a complete blast. Several of my colleagues presented their talks on the JBoss Data Grid and EAP clustering performance. However, JUDCon and JBoss World were particularly appealing to me personally as they made me aware of an increasing demand for large scale computing, and in particular, use of Infinispan's own distributed executors and Map/Reduce.

Anil Saldhana's talk about Big Data and Hadoop at JUDCon investigated the Hadoop setup in the JBoss ecosystem and its use for log analysis. In our discussion Anil contrasted the cumbersome Hadoop setup and API to our own Infinispan Map/Reduce solution.

Mark Addy's JUDCon "Infinispan from POC to Production" presentation was particularly engaging. Mark and his team at C2B2 developed a search engine for a UK based global on-line travel company using Infinspan as one of the key system components. One of their use cases involved extracting a particular pricing info from Infinispan cluster where distributed executors framework was an excellent fit. Long story short the response time improvement was an order of magnitude faster and the parallel execution on Infinispan cluster using distributed executors saved the day. 

Erik Salter's presentation "Infinispan == Profit: A Start-up’s Success with JBoss Community Software" summarized interesting details about video on demand service that Erik and his team developed for Cisco. Erik used the Infinispan cluster for session setup and management and found distributed executors and Map/Reduce to be a particularly good fit for a range of design trade offs he and his team faced. 

Stay tuned for more good things to come in this area!

Cheers,
Vladimir

Monday, 9 July 2012

JBoss Data Grid lands in Red Hat Summit!

It's just over a week since Red Hat Summit/JBoss World 2012 finished and it was a great pleasure to be part of it. Heiko Rupp and I were speaking about "Effectively Manage & Monitor Red Hat JBoss Data Grid Nodes" where we presented JDG and JON at a high level and then we showed a demo of both products interacting with each other. The presentation's slides are now available for download.

I was not the only one speaking about JBoss Data Grid. Both Manik, the Infinispan project lead, and Alan Santos, the product manager for Infinispan, were also delivering talks on JBoss Data Grid. Although their presentations are not up yet, you'll be able to download them from here.

I also met some Infinispan customers, such as Erik who's been using Infinispan at a well known telecommunications company, or two of the lead technical guys at a well known Geneva bank. We had some great conversations where we were able to synch up our roadmap with them to make sure any requirements they had are met in the future.

Of the presentations I attended, I was particularly impressed by Pete and Marius' 10 minute demo on going from 0 to a fully fledged mobile application running on top of AS7 in OpenShift in the "What's New in Java Frameworks for Web, Cloud, & Mobile" BOF. I hope it was recorded cos it was very impressive stuff.

It was a great week and once again it was a pleasure to be part of the Red Hat Summit and JBoss World :)

Cheers,
Galder