Manik
Tuesday, 20 July 2010
Infinispan 4.1.0 "Radegast" 2nd release candidate just released!
Manik
Tuesday, 6 July 2010
Infinispan 4.1.0.CR1 is now available!
- An fantastic demo showing how to run Infinispan in EC2. Check Noel O'Connor's blog last month for more detailed information.
- Enable Hot Rod servers to run behind a proxy in environments such as EC2, and make TCP buffers and TCP no delay flag configurable for both the server and client.
- Important performance improvements for Infinispan based Lucene directory and Hot Rod client and sever.
- To avoid confusion, the single jar distribution has been removed. The two remaining distributions are: The bin distribution containing the Infinispan modules and documentation, and the all distribution which adds demos on top of that.
Monday, 28 June 2010
JBossWorld and JUDCon post-mortem
The first-ever JUDCon, the developer conference that took place the day before JBoss World and Red Hat Summit, was great and I look forward to future JUDCons around the world. Pics from the first-ever JUDCon are now online, along with some video interviews with Jason Greene and Pete Muir.
Some of the great presentations at JUDCon include Galder Zamarreño's talk on Infinispan's Hot Rod protocol (slides here) and a talk I did with Mircea Markus on the cache benchmarking framework and benchmarking Infinispan (slides here).
JBoss World/Red Hat Summit was also very interesting. There is clearly a lot of excitement around Infinispan, and we heard about interesting deployments and use cases, lots of ideas and thoughts for further improvement from customers, contributors and partners.
From JBoss World, there were three talks on Infinispan, including Storing Data on Cloud Infrastructure in a Scalable, Durable Manner which I presented along with Mircea Markus (slides), Why RESTful Design for the Cloud is Best by Galder Zamarreño (slides) and Using Infinispan for High Availability, Load Balancing, & Extreme Performance which I presented along with Galder Zamarreño (slides).
In addition to the slides, the first talk was even recorded so if you missed it, you can watch it below:
Further, Infinispan was showcased on Red Hat CTO Brian Stevens' keynote speech (about 28:15 into the video) where Brian talks about data grids and their importance, and I demonstrate Infinispan.
We even had an open roadmap and design session for Infinispan 5.0, which included not just core Infinispan engineers, but contributors, end-users and anyone who had any sort of interest. I'll post again later with details of 5.0 and what our plans for it will be.
For those of you who couldn't make it to JUDCon and JBoss World, hope the slides and videos on this post will help give you an idea of what went on.
Cheers
Manik
Wednesday, 23 June 2010
JUDCon and the JBoss Community Awards
The JBoss Community Recognition Award winners were also announced at JUDCon, and I was really surprised to find that 4 of the 5 winners were Infinispan contributors. Sanne Grinovero, Alex Kluge, Phil van Dyck and Amin Abbaspour - thanks for your participation in Infinispan, your peers have recognised your contributions and have voted with mouse clicks! Congrats!
Given how many Infinispan engineers and contributors are at JBoss World this week, we are having an open Infinispan 5.0 planning and roadmap session. So if you are around and would like to join in, this will be at 4:00pm on Thursday, in Campground 1. For those of you not able to make it, discussions will continue via the usual channels of IRC and the developer mailing list.
Now to prepare for my next talk ... :-)
Cheers
Manik
Friday, 28 May 2010
Infinispan 4.1Beta2 released
The second and hopefully last Beta for 4.1 has just been released. Thanks to excellent community feedback, several HotRod client/server issues were fixed. Besides this and other bug-fixes (check this for complete list), following new features were added:
- an key affinity service that generates keys to be distributed to specific nodes
- RemoteCacheStore that allow an Infinspan cluster to be used as an remote data store
Enjoy!
Mircea
Tuesday, 25 May 2010
Infinispan EC2 Demo
Infinispan's distributed mode is well suited to handling large datasets and scaling the clustered cache by adding nodes as required. These days when inexpensive scaling is thought of, cloud computing immediately comes to mind.
One of the largest providers of cloud computing is Amazon with its Amazon Web Services (AWS) offering. AWS provides computing capacity on demand with its EC2 services and storage on demand with its S3 and EBS offerings. EC2 provides just an operating system to run on and it is a relatively straightforward process to get an Infinispan cluster running on EC2. However there is one gotcha, EC2 does not support UDP multicasting at this time and this is the default node discovery approach used by Infinispan to detect nodes running in a cluster.
Some background on network communications
Infinispan uses the JGroups library to handle all network communications. JGroups enables cluster node detection, a process called discovery, and reliable data transfer between nodes. JGroups also handles the process of nodes entering and exiting the cluster and master node determination for the cluster.
Configuring JGroups in Infinispan
The JGroups configuration details are passed to Infinispan in the infinispan configuration file
<transport clusterName="infinispan-cluster" distributedSyncTimeout="50000"Node Discovery
transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
<properties>
<property name="configurationFile" value="jgroups-s3_ping-aws.xml" />
</properties>
</transport>
JGroups has three discovery options which can be used for node discovery on EC2.
The first is to statically configure the address of each node in the cluster in each of the nodes peers. This simplifies discovery but is not suitable when the IP addresses of each node is dynamic or nodes are added and removed on demand.
The second method is to use a Gossip Router. This is an external Java process which runs and waits for connections from potential cluster nodes. Each node in the cluster needs to be configured with the ip address and port that the Gossip Router is listening on. At node initialization, the node connects to the gossip router and retrieves the list of other nodes in the cluster.
Example JGroups gossip router configuration
<config>
<TCP bind_port="7800" />
<TCPGOSSIP timeout="3000" initial_hosts="192.168.1.20[12000]"
num_initial_members="3" />
<MERGE2 max_interval="30000" min_interval="10000" />
<FD_SOCK start_port="9777" />
...
</config>
The infinispan-4.1.0-SNAPSHOT/etc/config-samples/ directory has sample configuration files for use with the Gossip Router. The approach works well but the dependency on an external process can be limiting.
The third method is to use the new S3_PING protocol that has been added to JGroups. Using this the user configures a S3 bucket (location) where each node in the cluster will store its connection details and upon startup each node will see the other nodes in the cluster. This avoids having to have a separate process for node discovery and gets around the static configuration of nodes.
Example JGroups configuration using the S3_PING protocol:
<config>
<TCP bind_port="7800" />
<S3_PING secret_access_key="secretaccess_key" access_key="access_key"
location=s3_bucket_location" />
<MERGE2 max_interval="30000" min_interval="10000" />
<FD_SOCK start_port="9777" />
...
</config>
EC2 demo
The purpose of this demo is to show how an Infinispan cache running on EC2 can easily form a cluster and retrieve data seamlessly across the nodes in the cluster. The addition of any subsequent Infinispan nodes to the cluster automatically distribute the existing data and offer higher availability in the case of node failure.
To demonstrate Infinispan, data is required to be added to nodes in the cluster. We will use one of the many public datasets that Amazon host on AWS, the influenza virus dataset publicly made available by Amazon.
This dataset has a number components which make it suitable for the demo. First of all it is not a trivial dataset, there are over 200,000 records. Secondly there are internal relationships within the data which can be used to demonstrate retrieving data from different cache nodes. The data is made up for viruses, nucleotides and proteins, each influenza virus has a related nucleotide and each nucleotide has one or more proteins. Each are stored in their own cache instance.

The caches are populated as follows :
- InfluenzaCache - populated with data read from the Influenza.dat file, approx 82,000 entries
- ProteinCache - populated with data read from the Influenza_aa.dat file, approx 102,000 entries
- NucleotideCache - populated with data read from the Influenza_na.dat file, approx 82,000 entries
The demo requires 4 small EC2 instances running Linux, one instance for each cache node and one for the Jboss application server which runs the UI. Each node has to have Sun JDK 1.6 installed in order to run the demos. In order to use the Web-based GUI, JBoss AS 5 should also be installed on one node.
In order for the nodes to communicate with each other the EC2 firewall needs to be modified. Each node should have the following ports open:
- TCP 22 – For SSH access
- TCP 7800 to TCP 7810 – used for JGroups cluster communications
- TCP 8080 – Only required for the node running the AS5 instance in order to access the Web UI.
- TCP 9777 - Required for FD_SOCK, the socket based failure detection module of the JGroups stack.
To run the demo, download the Infinispan "all" distribution, (infinispan-xxx-all.zip) to a directory on each node and unzip the archive.
Edit the etc/config-samples/ec2-demo/jgroups-s3_ping-aws.xml file to add the correct AWS S3 security credentials and bucket name.
Start the one of the cache instances on each node using one of the following scripts from the bin directory:
- runEC2Demo-influenza.sh
- runEC2Demo-nucleotide.sh
- runEC2Demo-protein.sh
Each script will startup and display the following information :
[tmp\] ./runEC2Demo-nucleotide.shCacheBuilder called with /opt/infinispan-4.1.0-SNAPSHOT/etc/config-samples/ec2-demo/infinispan-ec2-config.xml
-------------------------------------------------------------------
GMS: address=redlappie-37477, cluster=infinispan-cluster, physical address=192.168.122.1:7800
-------------------------------------------------------------------
Caches created....
Starting CacheManagerCache
Address=redlappie-57930Cache
Address=redlappie-37477Cache
Address=redlappie-18122
Parsing files....Parsing [/opt/infinispan-4.1.0-SNAPSHOT/etc/Amazon-TestData/influenza_na.dat]
About to load 81904 nucleotide elements into NucleiodCache
Added 5000 Nucleotide records
Added 10000 Nucleotide records
Added 15000 Nucleotide records
Added 20000 Nucleotide records
Added 25000 Nucleotide records
Added 30000 Nucleotide records
Added 35000 Nucleotide records
Added 40000 Nucleotide records
Added 45000 Nucleotide records
Added 50000 Nucleotide records
Added 55000 Nucleotide records
Added 60000 Nucleotide records
Added 65000 Nucleotide records
Added 70000 Nucleotide records
Added 75000 Nucleotide records
Added 80000 Nucleotide records
Loaded 81904 nucleotide elements into NucleotidCache
Parsing files....Done
Protein/Influenza/Nucleotide Cache Size-->9572/10000/81904
Protein/Influenza/Nucleotide Cache Size-->9572/20000/81904
Protein/Influenza/Nucleotide Cache Size-->9572/81904/81904
Protein/Influenza/Nucleotide Cache Size-->9572/81904/81904
Items of interest in the output are the Cache Address lines which display the address of the nodes in the cluster. Also of note is the Protein/Influenza/Nucleotide line which displays the number of entries in each cache. As other caches are starting up these numbers will change as cache entries are dynamically moved around through out the Infinispan cluster.
To use the web based UI we first of all need to let the server know where the Infinispan configuration files are kept. To do this edit the jboss-5.1.0.GA/bin/run.conf file and add the line
JAVA_OPTS="$JAVA_OPTS -DCFGPath=/opt/infinispan-4.1.0-SNAPSHOT/etc/config-samples/ec2-demo/"at the bottom. Replace the path as appropriate.
Now start the Jboss application server using the default profile e.g. run.sh -c default -b xxx.xxx.xxx.xxx, where “xxx.xxx.xxx.xxx” is the public IP address of the node that the AS is running on.
Then drop the infinispan-ec2-demoui.war into the jboss-5.1.0.GA /server/default/deploy directory.
Finally point your web browser to http://public-ip-address:8080/infinispan-ec2-demoui and the following page will appear.
The search criteria is the values in the first column of the /etc/Amazon-TestData/influenza.dat file e.g. AB000604, AB000612, etc.

Note that this demo will be available in Infinispan 4.1.0.BETA2 onwards. If you are impatient, you can always build it yourself from Infinispan's source code repository.
Enjoy,
Noel
Thursday, 13 May 2010
Client/Server architectures strike back, Infinispan 4.1.0.Beta1 is out!
A detailed change log is available and the release is downloadable from the usual place.
For the rest of the blog post, we’d like to share some of the objectives of Infinispan 4.1 with the community. Here at ‘chez Infinispan’ we’ve been repeating the same story over and over again: ‘Memory is the new Disk, Disk is the new Tape’ and this release is yet another step to educate the community on this fact. Client/Server architectures based around Infinispan data grids are key to enabling this reality but in case you might be wondering, why would someone use Infinispan in a client/server mode compared to using it as peer-to-peer (p2p) mode? How does the client/server architecture enable memory to become the new disk?
Broadly speaking, there three areas where a Infinispan client/server architecture might be chosen over p2p one:
Infinispan’s roots can be traced back to JBoss Cache, a caching library developed to provide J2EE application servers with data replication. As such, the primary way of accessing Infinispan or JBoss Cache has always been via direct calls coming from the same JVM. However, as we have repeated it before, Infinispan’s goal is to provide much more than that, it aims to provide data grid access to any software application that you can think of and this obviously requires Infinispan to enable access from non-Java environments.
Infinispan comes with a series of server modules that enable that precisely. All you have to do is decide which API suits your environment best. Do you want to enable access direct access to Infinispan via HTTP? Just use our REST or WebSocket modules. Or is it the case that you’re looking to expand the capabilities of your Memcached based applications? Start an Infinispan-backed Memcached server and your existing Memcached clients will be able to talk to it immediately. Or maybe even you’re interested in accessing Infinispan via Hot Rod, our new, highly efficient binary protocol which supports smart-clients? Then, gives us a hand developing non-Java clients that can talk the Hot Rod protocol! :).
2. Infinispan as a dedicated data tier
Quite often applications running running a p2p environment have caching requirements larger than the heap size in which case it makes a lot of sense to separate caching into a separate dedicated tier.
It’s also very common to find businesses with varying work loads overtime where there’s a need to start business processing servers to deal with increased load, or stop them when work load is reduced to lower power consumption. When Infinispan data grid instances are deployed alongside business processing servers, starting/stopping these can be a slow process due to state transfer, or rehashing, particularly when large data sets are used. Separating Infinispan into a dedicated tier provides faster and more predictable server start/stop procedures – ideal for modern cloud-based deployments where elasticity in your application tier is important.
It’s common knowledge that optimizations for large memory usage systems compared to optimizations for CPU intensive systems are very different. If you mix both your data grid and business logic under the same roof, finding a balanced set of optimizations that keeps both sides happy is difficult. Once again, separating the data and business tiers can alleviate this problem.
You might be wondering that if Infinispan is moved to a separate tier, access to data now requires a network call and hence will hurt your performance in terms of time per call. However, separating tiers gives you a much more scalable architecture and your data is never more than 1 network call away. Even if the dedicated Infinispan data grid is configured with distribution, a Hot Rod smart-client implementation - such as the Java reference implementation shipped with Infinispan 4.1.0.BETA1 - can determine where a particular key is located and hit a server that contains it directly.
3. Data-as-a-Service (DaaS)
Increasingly, we see scenarios where environments host a multitude of applications that share the need for data storage, for example in Plattform-as-a-Service (PaaS) cloud-style environments (whether public or internal). In such configurations, you don’t want to be launching a data grid per each application since it’d be a nightmare to maintain – not to mention resource-wasteful. Instead you want deployments or applications to start processing as soon as possible. In these cases, it’d make a lot of sense to keep a pool of Infinispan data grid nodes acting as a shared storage tier. Isolated cache access could easily achieved by making sure each application uses a different cache name (i.e. the application name could be used as cache name). This can easily achieved with protocols such as Hot Rod where each operation requires a cache name to be provided.
Regardless of the scenarios explained above, there’re some common benefits to separating an Infinispan data grid from the business logic that accesses it. In fact, these are very similar to the benefits achieved when application servers and databases don’t run under the same roof. By separating the layers, you can manage each layer independently, which means that adding/removing nodes, maintenance, upgrades...etc can be handled independently. In other words, if you wanna upgrade your application server or servlet container, you don’t need to bring down your data layer.
All of this is available to you now, but the story does not end here. Bearing in mind that these client/server modules are based around reliable TCP/IP, using Netty, the fast and reliable NIO library, they could also in the future form the base of new functionality. For example, client/server modules could be linked together to connect geographically separated Infinispan data grids and enable different disaster recovery strategies.
So, download Infinispan 4.1.0.BETA1 right away, give a try to these new modules and let us know your thoughts.
