Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Monday, 19 February 2018

Distributed iteration improvements

Infinispan hasn't always provided a way for iterating upon entries in a distributed cache. In fact the first iteration wasn't until Infinispan 7. Then in Infinispan 8, with the addition of Java 8, we fully integrated this into distributed streams, which brought some minor iteration improvements in performance.

We are proud to announce that with Infinispan 9.2 there are even more improvements. This contains no API changes, although those will surely come in the future. This one is purely for performance and utilization.

New implementation details

 

There are a few different aspects that have been changed.  A lot of these revolve around the amount of entries being retrieved at once, which if you are familiar with DistributedStreams can be configured via the distributedBatchSize method. Note that if this is not specified it defaults to the chunk size in state transfer.

Entry retrieval is now pull based instead of push

Infinispan core (embedded) has added rxjava2 and reactive streams as dependencies and rewrote all of the old push style iterator code over to pull style to fully utilize the Publisher and Subscriber interfaces.

With this we only pull up to the batchSize in entries at a time from any set of nodes. The old style utilized push with call stack blocking, which could return up two times the amount of entries. Also since we aren't performing call stack blocking, we don't have to waste threads as these calls to retrieve entries are done async and finish very quickly irrespective of user interaction. The old method required multiple threads to be reserved for this purpose.

Streamed batches

The responses from a remote node are written directly to the output stream so there are no intermediate collections allocated. This means we only have to iterate upon the data once as we retain the iterator between requests. On the originator we still have to store the batches in a collection to be enqueued for the user to pull.

Rewritten Parallel Distribution

Great care was taken to implement parallel distribution in a way to vastly reduce contention and ensure that we properly follow the batchSize configuration.

When parallel distribution is in use the new implementation will start 4 remote node requests sharing the batch size (so each one gets 1/4). This way we can guarantee that we only have the desired size irrespective of the number of nodes in the cluster. The old implementation would request batchSize from all nodes at the same time. So not only did it reserve a thread for node but could easily swamp your JVM memory, causing OutOfMemoryErrors (which no one likes). The latter alone made us force the default to be sequential distribution when using an iterator.

The old implementation would write entries from all nodes (including local) to the same shared queue. The new implementation has a different queue for each request, which allows for faster queues with no locking to be used.

Due to these changes and other isolations between threads, we can now make parallel distribution the default setting for the iterator method. And as you will see this has improved performance nicely.

Performance


We have written a JMH test harness specifically for this blog post, testing 9.1.5.Final build against latest 9.2.0.SNAPSHOT. The test runs by default with 4GB of heap with 6 nodes in a distributed cache with 2 owners. It has varying entry count, entry sizes and distributed batch sizes.

Due to the variance in each test a large number of tests were ran and with different permutations to make sure it covered a large amount of test cases. The JMH test that was ran can be found at github. All the default settings were used for the run except -t4 (runs with 4 worker threads) was provided. This was all ran on my measly laptop (i7-4810MQ and 16 GB) - maxing out the CPU was not a hard task.

CAVEAT: The tests don't do anything with the iterator and just try to pull them as fast as they can. Obviously if you have a lot of processing done between iterations you will likely not see as good of a performance increase.

The entire results can be found here. It shows each permutation and how many operations per second and finds the difference (green shows 5% or more and red shows -5% or less).


Operation Average Gain Code
Specified Distribution Mode 3.5% .entrySet().stream().sequentialDistribution.iterator()
Default 11% .entrySet().iterator()
No Rehash 14% .entrySet().stream().disableRehashAware().iterator()

The above 3 rows show a few different ways you could have been invoking the iterator method. The second row is probably by far the most used case. In this case you should see around a 11% increase in performance (results will vary). This is due to the new pulling method as well as parallel distribution becoming the new default running mode. It is unlikely a user was using the other 2 methods, but are provided for a more complete view.

If you were specifying a distribution mode manually, either sequential or distribution you will only see a few percent faster run (3.5%), but every little bit helps! Also if you can switch to parallel you may want to think about doing so.

Also you can see if you were running with rehash disabled prior, it has even more gains (14%). Those don't even include the fact that no rehash was 28% faster than with before (which means it is about 32% faster in general now). So if you can get away with a at most once guarantee, disabling rehash will provide the best throughput.

Whats next? 


As was mentioned this is not exposed to the user directly. You still interact with the iterator as you would normally. We should remedy this at some point.

Expose new method

We would love to eventually expose a method to return a Publisher directly to the user so that they can get the full benefits of having a pull based implementation underneath.

This way any intermediate operations applied to the stream before would be distributed and anything applied to the Publisher would be done locally. And just like the iterator method this publisher would be fully rehash aware if you have it configured to do so and would make sure you get all entries delivered in an exactly once fashion (rehash disabled guarantees at most once).

Another side benefit is that the Subscriber methods could be called on different threads so there is no overhead required on the ISPN side for coordinating these into queue(s). Thus the Subscriber should be able to retrieve all entries faster than just doing an iterator.

Java 9 Flow

Also many of you may be wondering why we aren't using the new Flow API introduced in Java 9. Luckily the Flow API is a 1:1 conversion of reactive streams. So whenever Infinispan will start supporting Java 9 interfaces/classes, we hope to properly expose these as the JDK classes.

Segment Based Iteration 

With Infinispan 9.3, we hope to introduce data container and cache store segment aware iteration. This means when iterating over either we would only have to process entries that map to a given segment. This should reduce the time and processing for iteration substantially, especially for cache stores. Keep your eyes out for a future blog post detailing these as 9.3 development commences.

Give us Feedback

We hope you find a bit more performance when working with your distributed iteration. Also we value any feedback on what you want our APIs to look like or find any bugs. As always let us know at any of the places listed here.

Monday, 5 December 2016

Infinispan 9.0.0.Beta1 "Ruppaner"


It took us quite a bit to get here, but we're finally ready to announce Infinispan 9.0.0.Beta1, which comes loaded with a ton of goodies.

  • Performance improvements
    • JGroups 4
    • A new algorithm for non-transactional writes (aka the Triangle) which reduces the number of RPCs required when performing writes 
    • A new faster internal marshaller which produced smaller payloads. 
    • A new asynchronous interceptor core
  • Off-Heap support
    • Avoid the size of the data in the caches affecting your GC times
  • CaffeineMap-based bounded data container
    • Superior performance
    • More reliable eviction
  • Ickle, Infinispan's new query language
    • A limited yet powerful subset of JPQL
    • Supports full-text predicates
  • The Server Admin console now supports both Standalone and Domain modes
  • Pluggable marshallers for Kryo and ProtoStuff
  • The LevelDB cache store has been replaced with the better-maintained and faster RocksDB 
  • Spring Session support
  • Upgraded Spring to 4.3.4.RELEASE
We will be blogging about the above in detail over the coming weeks, including benchmarks and tutorials.
The following improvements were also present in our previous Alpha releases:
  • Graceful clustered shutdown / restart with persistent state
  • Support for streaming values over Hot Rod, useful when you are dealing with very large entries
  • Cloud and Containers
    • Out-of-the box support for Kubernetes discovery
  • Cache store improvements
    • The JDBC cache store now use transactions and upserts. Also the internal connection pool is now based on HikariCP

Also, our documentation has received a big overhaul and we believe it is vastly superior than before.

There will be one more Beta including further performance improvements as well as additional features, so stay tuned.
Infinispan 9 is codenamed "Ruppaner" in honor of the Konstanz brewery, since many of the improvements of this release have been brewed on the shores of the Bodensee !

Prost!

Friday, 9 January 2015

Infinispan 7.1.0.Beta1

Dear Infinispan community,

We're proud to announce the first Beta release of Infinispan 7.1.0.

Infinispan brings the following major improvements:
  • Near-Cache support for Remote HotRod caches
  • Annotation-based generation of ProtoBuf serializers which removes the need to write the schema files by hand and greatly improves usability of Remote Queries
  • Cluster Listener Event Batching, which coalesces events for better performance
  • Cluster- and node-wide aggregated statistics
  • Vast improvements to the indexing performance
  • Support for domain mode and the security vault in the server
  • Further improvements to the Partition Handling with many stability fixes and the removal of the Unavailable mode: a cluster can now be either Available or Degraded.
Of course there's also the usual slew of bug fixes, performance and memory usage improvements and documentation cleanups.

Feel free to join us and shape the future releases on our forums, our mailing lists or our #infinispan IRC channel.

For a complete list of features and bug fixes included in this release please refer to the release notes. Visit our downloads section to find the latest release.

Thanks to everyone for their involvement and contribution!

Thursday, 5 June 2014

Map/Reduce Performance improvements between Infinispan 6 and 7


Introduction


There have been a number of recent Infinispan 7.0 Map/Reduce performance related improvements that we were eager to test in our performance lab and subsequently share with you. The results are more than promising. In the word count use case, Map/Reduce task execution speed and throughput improvement is between fourfold and sixfold in certain situations that were tested.

We have achieved these improvements by focusing on:
  • Optimized mapper/reducer parallel execution on all nodes
  • Improving the handling and processing of larger data sets
  • Reducing the amount of memory needed for execution of MapReduceTask

Performance Test Results


The performance tests were run using the following parameters:
  • An Infinispan 7.0.0-SNAPSHOT build created after the last commits from the list were committed to the Infinispan GIT repo on May 9th vs Infinispan 6.0.1.Final 
  • OpenJDK version 1.7.0_55 with 4GB of heap and the following JVM options:
    -Xmx4096M -Xms4096M -XX:+UseLargePages -XX:MaxPermSize=512m -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
  • Random data filled 30% of the Java heap, and 100 random words were used to create the 8 kilobyte cache values. The cache keys were generated using key affinity, so that the generated data would be distributed evenly in the cache. These values were chosen, so that a comparison to Infinispan 6 could be made. Infinispan 7 can handle a final result map with a much larger set of keys than is possible in Infinispan 6. The actual amount of heap size that is used for data will be larger due to backup copies, since the cluster is running in distributed mode.
  • The MapReduceTask executes a word count against the cache values using mapper, reducer, combiner, and collator implementations. The collator returns the 10 most frequently occurring words in the cache data. The task used a distributed reduce phase and a shared intermediate cache. The MapReduceTask is executed 10 times against the data in the cache and the values are reported as an average of these durations.

    From 1 to 8 nodes using a fixed amount of data and 30% of the heap


    This test executes two word count executions on each cluster with an increasing number of nodes. The first execution uses an increasing amount of data equal to 30% of the total Java heap across the cluster (i.e. With one node, the data consumes 30% of 4 GB. With two nodes, the data consumes 30% of 8 GB, etc.), and the second execution uses a fixed amount of data, (1352 MB which is approximately 30% of 4 GB). Throughput is calculated by dividing the total amount of data processed by the Map/Reduce task by the duration. The following charts show the throughput as nodes are added to the cluster for these two scenarios:

    These charts clearly show the increase in throughput that were made in Infinispan 7. The throughput also seems to scale in an almost linear fashion for this word count scenario. With one node, Infinispan 7 processes the 30% of heap data in about 100 MB/sec, two nodes process almost 200 MB/sec, and 8 nodes process over 700 MB/sec.

    From 1 to 8 nodes using different heap size percentages


    This test executes the word count task using different percentages of heap size as nodes are added to the cluster. (5%, 10%, 15%, 20%, 25%, and 30%) Here are the throughput results for this test:

    Once again, these charts show an increase in throughput when performing the same word count task using Infinispan 7. The chart for Infinispan 7 shows more fluctuation in the throughput across the different percentages of heap size. The throughput plotted in the Infinispan 6 chart is more consistent.

    From 1 to 8 nodes using different value sizes


    This test executes the word count task using 30% of the heap size and different cache value sizes as nodes are added to the cluster. (1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, and 2MB) Here are the throughput results for this test:

    These results are more interesting. The throughput in Infinispan 7 is higher for certain cache size values, but closer to Infinispan 6 or even slower for other cache size values. The throughput peaks for 32KB cache values, but can be much lower for larger and smaller values. Smaller values require more overhead, but for larger values this behavior is not expected. This result needs to be investigated more closely.

    Conclusion


    The performance tests show that Infinispan 7 Map/Reduce improvements have increased the throughput and execution speed four to sixfold in some use cases. The changes have also allowed Infinispan 7 to process data sets that include larger intermediate results and produce larger final result maps. There are still areas of the Map/Reduce algorithm that need to be improved:
    • The Map/Reduce algorithm should be self-tuning. The maxCollectorSize parameter controls the number of values that the collector holds in memory, and it is not trivial to determine the optimal value for a given scenario. The value is based on the size of the values in the cache and the size of the intermediate results. A user is likely to know the size of the cache values, but currently Infinispan does not report statistics about the intermediate results to the user. The Map/Reduce algorithm should analyze the environment at runtime and adjust the size of the collector dynamically.
    • The fact that the throughput results vary with different value sizes needs to be investigated more closely. This could be due to the fact that the maxCollectorSize value used for these tests is not ideal for all value sizes, but there might be other causes for this behaviour.

    Wednesday, 16 October 2013

    New book: Performance of Open Source Applications, with a chapter on Infinispan

    From the good folks who brought you the excellent Architecture of Open Source Applications (AOSA), available for free online, as a PDF, for e-book readers or as a good old-fashioned dead tree, we're now treated to a new tome - the Performance of Open Source Applications (POSA).

    POSA follows the same concept as AOSA - a different authoritative figure in the open source community is responsible for each chapter, providing you with excellent insight on how some of the most popular open source applications have been designed and built.  POSA focuses specifically on performance rather than general software architecture, and I've contributed a chapter on the performance related work conducted for Infinispan (see Chapter 7).

    Have a read, I'd love to know what you think.

    Cheers
    Manik

    Monday, 16 September 2013

    New persistence API in Infinispan 6.0.0.Alpha4

    The existing CacheLoader/CacheStore API has been around since Infinispan 4.0. In this release of Infinispan we've taken a major step forward in both simplifying the integration with persistence and opening the door for some pretty significant performance improvements.

    What's new


    So here's what the new persistence integration brings to the table:
    • alignment with JSR-107: now we have a CacheWriter and CacheLoader interface similar to the the loader and writer in JSR 107, which should considerably help writing portable stores across JCache compliant vendors
    • simplified transaction integration: all the locking is now handled within the Infinispan layer, so implementors don't have to be concerned coordinating concurrent access to the store (old LockSupportCacheStore is dropped for that reason).
    • parallel iteration: it is now possible to iterate over entries in the store with multiple threads in parallel. Map/Reduce tasks immediately benefit from this, as the map/reduce  tasks now run in parallel over both the nodes in the cluster and within the same node (multiple threads)
    • reduced serialization (translated in less CPU usage): the new API allows exposing the stored entries in serialized format. If an entry is fetched from persistent storage for the sole purpose of being sent remotely, we no longer need to deserialize it (when reading from the store) and serialize it back (when writing to the wire). Now we can write to the wire the serialized format as read fro the storage directly

    API


    Now let's take a look at the API in more detail:



      The diagram above shows the main classes in the API:
    • ByteBuffer - abstracts the serialized form on an object
    • MarshalledEntry - abstracts the information held within a persistent store corresponding to a key-value added to the cache. Provides method for reading this information both in serialized (ByteBuffer) and deserialized (Object) format. Normally data read from the store is kept in serialized format and lazily deserialized on demand, within the MarshalledEntry implementation
    •  CacheWriter and CacheLoader  provide basic methods for reading and writing to a store
    • AdvancedCacheLoader and AdvancedCacheWriter provide operations to manipulate the underlaying storage in bulk: parallel iteration and purging of expired entries, clear and size. 
    A provider might choose to only implement a subset of these interfaces:
    • Not implementing the  AdvancedCacheWriter makes the given writer not usable for purging expired entries or clear
    • Not implementing  the AdvancedCacheLoader makes the information stored in the given loader not used for preloading, nor for the map/reduce iteration
    If you're looking at migrating your existing store to the new API, looking at the SingleFileStore  for inspiration can be of great help.

    Configuration


    And finally, the way the stores are configured has changed:
    • the 5.x loaders element is now replaced with persistence
    • both the loaders and writers are configured through a unique store element  (vs loader and  store, as allowed in 5.x)
    • the preload and shared attributes are configured at each individual store, giving more flexibility when it comes to configuring multiple chained stores 
    Cheers,
    Mircea

    Thursday, 18 July 2013

    Faster file cache store (no extra dependencies!) in 6.0.0.Alpha1

    As announced yesterday by Adrian, the brand new Infinispan 6.0.0.Alpha1 release contains a new file-based cache store which needs no extra dependencies. This is essentially a replacement of the existing FileCacheStore which didn't perform as expected, and caused major issues due to the number of files it created.

    The new cache store, contributed by a Karsten Blees (who also contributed an improved asynchronous cache store), is called SingleFileCacheStore and it keeps all data in a single file. The way it looks up data is by keeping an in-memory index of keys and the positions of their values in this file. This design outperforms the existing FileCacheStore and even LevelDB based JNI cache store.

    The classic case for a file based cache store is when you want to have a cache with a cache store available locally which stores data that has overflowed from memory, having exceeded size and/or time restrictions. We ran some performance tests to verify how fast different cache store implementations could deal with reading and writing overflowed data, and these are the results we got (in Ks):

    • FileCacheStore: 0.75k reads/s, 0.285k writes/s
    • LevelDB-JNI impl: 46k reads/s, 15.2k writes/s
    • SingleFileCacheStore: 458k reads/s, 137k writes/s
    The difference is quite astonishing but as already hinted, this performance increase comes at a cost. Having to maintain an index of keys and positions in the file in memory has a cost in terms of extra memory required, and potential impact on GC. That's why the SingleFileCacheStore is not recommended for use cases where the keys are too big.

    In order to help tame this memory consumption issues, the size of the cache store can be optionally limited, providing a maximum number of entries to store in it. However, setting this parameter will only work in use cases where Infinispan is used as a cache. When used as a cache, data not present in Infinispan can be recomputed or re-retrieved from the authoritative data store and stored in Infinispan cache. The reason for this limitation is because once the maximum number of entries is reached, older data in the cache store is removed, so if Infinispan was used as an authoritative data store, it would lead to data loss which is not good.

    Existing FileCacheStore users might wonder: what is it gonna happen to the existing FileCacheStore? We're not 100% sure yet what we're going to do with it, but we're looking into some ways to migrate data from the FileCacheStore to the SingleFileCacheStore. Some interesting ideas have already been submitted which we'll investigate in next Infinispan 6.0 pre-releases.

    So, if you're a FileCacheStore user, give the new SingleFileCacheStore a go and let us know how it goes! Switching from one to the other is easy :)

    Cheers,
    Galder

    Tuesday, 2 July 2013

    Lower memory overhead in Infinispan 5.3.0.Final

    Infinispan users worried about memory consumption should upgrade to Infinispan 5.3.0.Final as soon as possible, because as part of the work we've done to support storing byte arrays without wrappers, and the development of the interoperability mode, we've been working to reduce Infinispan's memory overhead.

    To measure overhead, we've used Martin Gencur's excellent memory consumption tests. The results for entries with 512 bytes are:

    Infinispan memory overhead, used in library mode:
    Infinispan 5.2.0.Final: ~151 bytes
    Infinispan 5.3.0.Final: ~135 bytes
    Memory consumption reduction: ~12%

    Infinispan memory overhead, for the Hot Rod server:
    Infinispan 5.2.0.Final: ~174 bytes
    Infinispan 5.3.0.Final: ~151 bytes
    Memory consumption reduction: ~15%

    Infinispan memory overhead, for the REST server:
    Infinispan 5.2.0.Final: ~208 bytes
    Infinispan 5.3.0.Final: ~172 bytes
    Memory consumption reduction: ~21%

    Infinispan memory overhead, for the Memcached server:
    Infinispan 5.2.0.Final: ~184 bytes
    Infinispan 5.3.0.Final: ~180 bytes
    Memory consumption reduction: ~2%

    This is great news for the Infinispan community but our effort doesn't end here. We'll be working on further improvements in next releases to bring down cost even further.

    Cheers,
    Galder

    Friday, 10 May 2013

    Infinispan vs Hazelcast Performance

    Sam Smoot has recently compared the performance of Infinispan versus Hazelcast both with default cache settings and posted some interesting performance results with Infinispan coming on top :)

    @Sam, we hear you and we're working on reducing the number of JARs required for standalone, default use case :)

    Cheers,
    Galder

    Saturday, 12 January 2013

    Infinispan memory overhead

    Have you ever wondered how much Java heap memory is actually consumed when data is stored in Infinispan cache? Let's look at some numbers obtained through real measurement.

    The strategy was the following:

    1) Start Infinispan server in local mode (only one server instance, eviction disabled)
    2) Keep calling full garbage collection (via JMX or directly via System.gc() when Infinispan is deployed as a library) until the difference in consumed memory by the running server gets under 100kB between two consecutive runs of GC
    3) Load the cache with 100MB of data via respective client (or directly store in the cache when Infinispan is deployed as a library)
    4) Keep calling the GC until the used memory is stabilised
    5) Measure the difference between the final values of consumed memory after the first and second cycle of GC runs
    6) Repeat steps 3, 4 and 5 four times to get an average value (first iteration ignored)

    The amount of consumed memory was obtained from a verbose GC log (related JVM options: -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/tmp/gc.log)

    The test output looks like this: https://gist.github.com/4512589

    The operating system (Ubuntu) as well as JVM (Oracle JDK 1.6) were 64-bit. Infinispan 5.2.0.Beta6. Keys were kept intentionally small (10 character Strings) with byte arrays as values. The target entry size is a sum of key size and value size.

    Memory overhead of Infinispan accessed through clients


    HotRod client

    entry size -> overall memory
    512B       -> 137144kB
    1kB        -> 120184kB
    10kB       -> 104145kB
    1MB        -> 102424kB

    So how much additional memory is consumed on top of each entry?

    entry size/actual memory per entry -> overhead per entry
    512B/686B                -> ~174B
    1kB(1024B)/1202B         -> ~178B
    10kB(10240B)/10414B      -> ~176B
    1MB(1048576B)/1048821B   -> ~245B

    MemCached client (text protocol, SpyMemcached client) 

    entry size -> overall memory
    512B       -> 139197kB
    1kB        -> 120517kB
    10kB       -> 104226kB
    1MB        -> N/A (SpyMemcached allows max. 20kB per entry)

    entry size/actual memory per entry -> overhead per entry
    512B/696B               -> ~184B
    1kB(1024B)/1205B        -> ~181B
    10kB(10240B)/10422B     -> ~182B

    REST client (Content-Type: application/octet-stream)

    entry size -> overall memory
    512B       -> 143998kB
    1kB        -> 122909kB
    10kB       -> 104466kB
    1MB        -> 102412kB

    entry size/actual memory per entry -> overhead per entry
    512B/720B               -> ~208B
    1kB(1024B)/1229B        -> ~205B
    10kB(10240B)/10446B     -> ~206B
    1MB(1048576B)/1048698B  -> ~123B

    The memory overhead for individual entries seems to be more or less constant
    across different cache entry sizes.

    Memory overhead of Infinispan deployed as a library


    Infinispan was deployed to JBoss Application Server 7 using Arquillian.

    entry size -> overall memory/overall with storeAsBinary
    512B       -> 132736kB / 132733kB
    1kB        -> 117568kB / 117568kB
    10kB       -> 103953kB / 103950kB
    1MB        -> 102414kB / 102415kB

    There was almost no difference in overall consumed memory when enabling or disabling storeAsBinary.

    entry size/actual memory per entry-> overhead per entry (w/o storeAsBinary)
    512B/663B               -> ~151B
    1kB(1024B)/1175B        -> ~151B
    10kB(10240B)/10395B     -> ~155B
    1MB(1048576B)/1048719B  -> ~143B

    As you can see, the overhead per entry is constant across different entry sizes and is ~151 bytes.

    Conclusion


    The memory overhead is slightly more than 150 bytes per entry when storing data into the cache locally. When accessing the cache via remote clients, the memory overhead is a little bit higher and ranges from ~170 to ~250 bytes, depending on remote client type and cache entry size. If we ignored the statistics for 1MB entries, which could be affected by a small number of entries (100) stored in the cache, the range would have been even narrower.


    Cheers,
    Martin

    Tuesday, 4 September 2012

    Speeding up Cache calls with IGNORE_RETURN_VALUES invocation flag

    Starting with Infinispan 5.2.0.Alpha3, a new Infinispan invocation flag has been added called IGNORE_RETURN_VALUES.

    This flag signals that the client that calls an Infinispan Cache operation () which has some kind of return, i.e. java.util.Map#put(Object, Object) (remember that Infinispan's Cache interface extends java.util.Map), the return value (which in the case of java.util.Map#put(Object, Object) represents the previous value) will be ignored by the client application. A typical client application that ignores the return value would use code like this:

    In this example, both cache put call are ignoring the return of the put call, which returns the previous value. In other words, when we cache the last login date, we don't care what the previous value was, so this is a great opportunity for the client code to be re-written in this way:

    Or even better:

    Thanks to such hints, Infinispan caches can behave in a more efficient way and can potentially do operations faster, because work associated with the production of the return value will be skipped. Such work can on occasions involve network calls, or access to persistent cache stores, so by avoiding this work, the cache calls are effectively faster.

    In previous Infinispan versions, a similar effect could be achieved with flags with a narrower target and which are considered too brittle for end user consumption such as SKIP_REMOTE_LOOKUP or SKIP_CACHE_LOAD. So, if you're using either of these flags in your Infinispan client codebase, we highly recommend that from Infinispan 5.2.0.Alpha3 you start using IGNORE_RETURN_VALUES instead.

    Cheers, Galder

    Wednesday, 28 March 2012

    Infinispan 5.1.3.FINAL is here!

    Infinispan 5.1.3.FINAL is out now after having received very positive feedback on 5.1.3.CR1 and fixing some other issues on top of that, such as the file cache store leaving files open, improving standalone Infinispan Memcached implementation performance, and including Infinispan CDI extension jars in our distribution.

    "Release early, release often", that's out motto, so we'll carry on taking feedback onboard and releasing new Infinispan versions where we improve on what we've done in the past apart from coming out with new goodies.

    Thanks to everyone, both users who have been getting in touch to provide their feedback and developers who have been quickly reacting to users fixing issues and implementing requested features.

    Full details of what has been fixed in FINAL (including CR1) can be found here, and if you have feedback, please visit our forums. Finally, as always, you can download the release from here.

    Cheers,
    Galder

    Monday, 12 March 2012

    JDK 8 backported ConcurrentHashMaps in Infinispan

    Doug Lea and the folks on the concurrency-interest group have been hard at work on an update of JSR 166 (concurrency utilities) for Java 8 - called JSR 166e.  These include some pretty impressive changes in a building-block we've all come to rely pretty heavily on, the ConcurrentHashMap.

    One if the big drawbacks in the ConcurrentHashMap, since it was introduced in Java 5, has always been memory footprint.  It is kinda bulky, especially when compared to a regular HashMap - 1.6 kb in memory versus about 100 bytes!  Yes, these are for empty maps.

    In Java 8, one of the improvements in the ConcurrentHashMap has been memory footprint - now closer to a regular HashMap.  In addition to that, the new Java 8 CHM performs better under concurrent load when compared to its original form.  See this discussion and comments in the proposed ConcurrentHashMapV8 sources for more details.

    So, Infinispan makes pretty heavy use of ConcurrentHashMaps internally.  One change in the recently released Infinispan 5.1.2.Final is these internal CHMs are built using a factory, and we've included a version of the Java 8 CHM in Infinispan.  So by default, Infinispan uses the JDK's provided CHM.  But if you wish to force Infinispan to use the backported Java 8 CHM, all you need to do is include the following JVM parameter when you start Infinispan:

    -Dinfinispan.unsafe.allow_jdk8_chm=true


    We'd love to hear what you have to say about this, in terms of memory footprint, garbage collection and overall performance.  Please use the Infinispan user forums to provide your feedback.

    Thanks
    Manik

    Thursday, 22 December 2011

    Startup performance

    One of the things I've done recently was to benchmark how quickly Infinispan starts up.  Specifically looking at LOCAL mode (where you don't have the delays of opening sockets and discovery protocols you see in clustered mode), I wrote up a very simple test to start up 2000 caches in a loop, using the same cache manager.

    This is a pretty valid use case, since when used as a non-clustered 2nd level cache in Hibernate, a separate cache instance is created per entity type, and in the past this has become somewhat of a bottleneck.

    In this test, I compared Infinispan 5.0.1.Final, 5.1.0.CR1 and 5.1.0.CR2.  5.1.0 is significantly quicker, but I used this test (and subsequent profiling) to commit a couple of interesting changes in 5.1.0.CR2, which has improved things even more - both in terms of CPU performance as well as memory footprint.

    Essentially, 5.1.0.CR1 made use of Jandex to perform annotation scanning of internal components at build-time, to prevent expensive reflection calls to determine component dependencies and lifecycle at runtime.  5.1.0.CR2 takes this concept a step further - now we don't just cache annotation lookups at build-time, but entire dependency graphs.  And determining and ordering of lifecycle methods are done at build-time too, again making startup times significantly quicker while offering a much tighter memory footprint.

    Enough talk.  Here is the test used, and here are the performance numbers, as per my laptop, a 2010 MacBook Pro with an i5 CPU.


    Multiverse:InfinispanStartupBenchmark manik [master]$ ./bench.sh 
    ---- Starting benchmark ---


      Please standby ... 


    Using Infinispan 5.0.1.FINAL (JMX enabled? false) 
       Created 2000 caches in 10.9 seconds and consumed 172.32 Mb of memory.


    Using Infinispan 5.0.1.FINAL (JMX enabled? true) 
       Created 2000 caches in 56.18 seconds and consumed 315.21 Mb of memory.


    Using Infinispan 5.1.0.CR1 (JMX enabled? false) 
       Created 2000 caches in 7.13 seconds and consumed 157.5 Mb of memory.


    Using Infinispan 5.1.0.CR1 (JMX enabled? true) 
       Created 2000 caches in 34.9 seconds and consumed 243.33 Mb of memory.


    Using Infinispan 5.1.0.CR2(JMX enabled? false) 
       Created 2000 caches in 3.18 seconds and consumed 142.2 Mb of memory.


    Using Infinispan 5.1.0.CR2(JMX enabled? true) 
       Created 2000 caches in 17.62 seconds and consumed 176.13 Mb of memory.


    A whopping 3.5 times faster, and significantly more memory-efficient especially when enabling JMX reporting.  :-)


    Enjoy!
    Manik

    Wednesday, 21 December 2011

    Infinispan 5.1.0.CR2 is out in time for Xmas!

    Infinispan 'Brahma' 5.1.0.CR2 is out now with a load of fixes and a few internal changes such the move to a StaX based XML parser as opposed to relying on JAXB which did not get in for CR1. The new parser is a lot faster and has less overhead and does not require any changes from a user perspective.

    We've also worked on improving startup time by indexing annotation metadata at build time and reading it at runtime. From a Infinispan user perspective, there's been some changes to how Infinispan is extended, in particular related to custom command implementations, where we know use JDK's ServiceLoader to load them.

    As per usual, downloads are in the usual place, use the forums to provide feedback and report any issues.

    Cheers, Merry Christmas and a Happy New Year to all the Infinispan community! :)
    Galder

    Wednesday, 23 November 2011

    More on transaction performance: use1PcForAutoCommitTransactions

    What's use1PcForAutoCommitTransactions all about?



    Don't be scared the name, use1PcForAutoCommitTransactions is a new feature (5.1.CR1) that does quite a cool thing: increases your transactions's performance.
    Let me explain.
    Before Infinispan 5.1 you could access the cache both transactional and non-transactional. Naturally the non-transactional access is faster and offers less consistency guarantees.But we don't support mixed access in Infinispan 5.1, so what what's to be done when you need the speed of non-transactional access and you are ready to trade some consistency guarantees for it?
    Well here is where use1PcForAutoCommitTransactions helps you. What is does is forces an induced (autoCommit=true) transaction to commit in a single phase. So only 1 RPC instead of 2RPCs as in the case of a full 2 Phase Commit (2PC).

    At what cost?


    You might end up with inconsistent data if multiple transactions modify the same key concurrently. But if you know that's not the case, or you can live with it then use1PcForAutoCommitTransactions will help your performance considerably.

    An example


    Let's say you do a simple put operation outside the scope of a transaction:


    Now let's see how this would behave if the cache has several different transaction configurations:

    Not using 1PC...



    The put will happen in two RPCs/steps: a prepare message is sent around and then a commit.

    Using 1PC...



    The put happens in one RPC as the prepare and the commit are merged. Better performance.

    Not using autoCommit



    An exception is thrown, as this is a transactional cache and invocations must happen within the scope of a transaction.

    Enjoy!
    Mircea

    Monday, 3 October 2011

    Transaction remake in Infinispan 5.1

    If you ever used Infinispan in a transactional way you might be very interested in this article as it describes some very significant improvements in version 5.1 "Brahma" (released with 5.1.Beta1):
    • starting with this release an Infinispan cache can accessed either transactionally or non-transactionally. The mixed access mode is no longer supported (backward compatibility still maintained, see below). There are several reasons for going this path, but one of them most important result of this decision is a cleaner semantic on how concurrency is managed between multiple requestors for the same cache entry.

    • starting with 5.1 the supported transaction models are optimistic and pessimistic. Optimistic model is an improvement over the existing default transaction model by completely deferring lock acquisition to transaction prepare time. That reduces lock acquisition duration and increases throughput; also avoids deadlocks. With pessimistic model, cluster wide-locks are being acquired on each write and only being released after the transaction completed (see below).


    Transactional or non transactional cache?


    It's up to you as an user to decide weather you want to define a cache as transactional or not. By default, infinispan caches are non transactional. A cache can be made transactional by changing the transactionMode attribute:

    transactionMode can only take two values: TRANSACTIONAL and NON_TRANSACTIONAL. Same thing can be also achieved programatically:

    Important:for transactional caches it is required to configure a TransactionManagerLookup.

    Backward compatibility


    The autoCommit attribute was added in order to assure backward compatibility. If a cache is transactional and autoCommit is enabled (defaults to true) then any call that is performed outside of a transaction's scope is transparently wrapped within a transaction. In other words Infinispan adds the logic for starting a transaction before the call and committing it after the call.

    So if your code accesses a cache both transactionally and non-transactionally, all you have to do when migrating to Infinispan 5.1 is mark the cache as transactional and enable autoCommit (that's actually enabled by default, so just don't disable it :)

    The autoCommit feature can be managed through configuration:

    or programatically:


    Optimistic Transactions


    With optimistic transactions locks are being acquired at transaction prepare time and are only being held up to the point the transaction commits (or rollbacks). This is different from the 5.0 default locking model where local locks are being acquire on writes and cluster locks are being acquired during prepare time.

    Optimistic transactions can be enabled in the configuration file:

    or programatically:

    By default, a transactional cache is optimistic.

    Pessimistic Transactions


    From a lock acquisition perspective, pessimistic transactions obtain locks on keys at the time the key is written. E.g.

    When cache.put(k1,v1) returns k1 is locked and no other transaction running anywhere in the cluster can write to it. Reading k1 is still possible. The lock on k1 is released when the transaction completes (commits or rollbacks).

    Pessimistic transactions can be enabled in the configuration file:

    or programatically:


    What do I need - pessimistic or optimistic transactions?


    From a use case perspective, optimistic transactions should be used when there's not a lot of contention between multiple transactions running at the same time. That is because the optimistic transactions rollback if data has changed between the time it was read and the time it was committed (writeSkewCheck).

    On the other hand, pessimistic transactions might be a better fit when there is high contention on the keys and transaction rollbacks are less desirable. Pessimistic transactions are more costly by their nature: each write operation potentially involves a RPC for lock acquisition.

    The path ahead


    This major transaction rework has opened the way for several other transaction related improvements:

    • Single node locking model is a major step forward in avoiding deadlocks and increasing throughput by only acquiring locks on a single node in the cluster, disregarding the number of redundant copies (numOwners) on which data is replicated

    • Lock acquisition reordering is a deadlock avoidance technique that will be used for optimistic transactions

    • Incremental locking is another technique for minimising deadlocks.




    Stay tuned!
    Mircea