Before the end of the year I wrote a blog post detailing some of the more recent changes that Infinispan introduced with the in memory data container. As was mentioned in the previous post we would be detailing some other new changes. If you poked around in our new schema after Beta 1 you may have spoiled the surprise for yourself.
With the upcoming 9.0 Beta 2, I am excited to announce that Infinispan will have support for entries being stored off heap, as in outside of the JVM heap. This has some interesting benefits and drawbacks, but we hope you can agree the benefits in many cases far outweigh the drawbacks. But before we get into that lets first see how you can configure your cache to utilize off heap.
New Configuration
The off heap configuration is another option under the new memory element that was discussed in the previous post. It is used in the same way that either OBJECT or BINARY is used. You can use either COUNT or MEMORY eviciton types, the example below shows the latter.
XML
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As you can see the configuration is almost identical to the other types of storage. The only real difference is the new address pointer argument, which will be explained below.
Requirements
Our off heap implementation supports all existing features of Infinispan. There are some limitations and drawbacks of using the feature. This section will describe these in further detail.
Serialization
Off Heap runs in essentially BINARY mode, which requires entries to be serialized into their byte[] forms. Thus all keys and entries must be Serializable or have provided Infinispan Externalizers.
Size
Currently a key and a value must be able to be stored in a byte[]. Therefore a key or value in serialized form cannot be more than just over 2 Gigabytes. This could be enhanced possibly at a later point, if the need arose. I hope you aren't transferring this over your network though!
Implementation Details
Our off heap implementation uses the Java
Unsafe to allocate memory outside of the Java heap. This data is stored
as a bucket of linked list pointers, just like a standard Java HashMap.
When an entry is added the key's serialized byte[] is hashed and an
appropriate offset is found in the bucket. Then the entry is added to
the bucket as the first element or if an entry(ies) is present it is
added to the rear of the linked list.
All of this data
is protected by an array of ReadWriteLock instances. The number of
address pointers is evenly divisible by the number of lock instances.
The number of lock instances is how many cores your machines doubled and
rounded to the nearest power of two. Thus each lock protects an
equivalent amount of address spaces. This provides for good lock
granularity and reads will not block each other but unfortunately writes
will wait and block all reads.
If you are using a
bounded off heap container either by count or memory this will create a
backing LRU doubly linked list to keep track of which elements were
accessed most recently and removes the least recently accessed element
when there are too many present in the cache.
Memory Overhead
As with all cache implementations
there is overhead required to store these entries. We have a fixed and
variable overhead which scales with the amount of entries. I will
detail these and briefly mention what they are used for.
Fixed overhead
As was mentioned there is a new address count
parameter when configuring off heap. This value is used to determine how
many linked list pointers are available. Normally you want to have more
node pointers than you have entries in the cache, since then chances
are you have one element in each linked list. This is very similar to
the int argument constructor
for HashMap. It is also rounded up to the nearest power of two. The
big difference being that this off heap implementation will not resize.
Thus your read/write times will be slower if you have a lot of
collisions. The overhead of a pointer is 8 bytes, so for approximately
one million pointers it will be 8 Megabytes of off heap.
Bounded
off heap requires very little fixed memory, just 32 bytes for head/tail
pointers and a counter and an additional Java lock object.
Variable overhead
Unfortunately to store your entries we may need
to wrap them with some data. Thus for every entry you add to the cache
we store an additional 25 bytes for each entry. This data is used for
header information and also our linked list forward pointer.
Bounded
off heap requires additional housekeeping for its LRU list nodes. Thus
each entry adds an additional 36 bytes above the number above. It is
larger due to requiring a doubly linked list and having to have pointers
to and from the entry and eviction node.
Performance
The off heap container was designed with the intent that key lookups are quite fast. In general these should be about the same performance. However local reads and stream operations can be a little slower as there is an additional deserialization phase required.
Summary
We hope you all try out our new off heap feature! Please make sure to contact us if you have any feedback, find any bugs or have any questions! You can get in contact with us on our forum, issue tracker, or directly on IRC freenode channel Infinispan.
Previously when configuring a JDBC store it was only possible for a user to specify the vendor of the underlying DB. Consequently, it was not possible for Infinispan to utilise more recent features of DB as the SQL utilised by our JDBC stores had to satisfy the capabilities of the oldest supported DB version.
In Infinispan 9 we have completely refactored the code responsible for generating SQL queries. Enabling our JDBC stores to take greater advantage of optimisations and features applicable to a given database vendor and version. See the below gist for examples of how to specify the major and minor versions of your database.
Programmatic config:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Note: If no version information is provided, then we attempt to retrieve version data via the JDBC driver. This is not always possible and in such cases we default to SQL queries which are compatible with the lowest supported version of the specified DB dialect.
Upsert Support
As a consequence of the refactoring mentioned above, writes to the JDBC stores finally utilise upserts. Previously, the JDBC stores had to first select an entry, before inserting or updating a DB row depending on whether the entry previously existed. Now, in supported DBs, store writes are performed atomically via a single SQL statement.
In some cases it may be desirable for the previous store behaviour to be utilised, in such cases the following property should be passed to your store's configuration and set to true: `infinispan.jdbc.upsert.disabled`.
Timestamp Indexing
By default an index is now created on the `timestamp-column` of a JDBC store when the "create-on-start" option is set to true for a store's table. The advantage of this index is that it prevents the DB from having to perform full table searches when purging a table of expired cache entries. Similar to upsert support, this index is optional an can be disabled by setting the property `infinispan.jdbc.indexing.disabled` to true.
Hello HikariCP
In Infinispan 9 we welcome HikariCP as the new default implementation for the JDBC PooledConnectionFactory. HikariCP provides superior performance to c3p0 (the previous default), whilst also providing a much smaller footprint. The PooledConnectionFactoryConfiguration remains the same as before, expect we now include the ability to explicitly define a properties file where additional configuration parameters can be specified for the underlying HikariCP. For a full list of the available HikariCP configuration properties, please see the official documentation.
Note: Support for c3p0 has been deprecated and will be removed in a future release. However, users can force c3p0 to be utilised as before by providing the system property `-Dinfinispan.jdbc.c3p0.force=true`.
Summary
We have introduced the above new features to the JDBC stores in order to improve performance and to enable us to further the store's capabilities in the future. If you're a user of the JDBC stores and have any feedback on the latest changes, or would like to request some new features/optimisations, let us know via the forum, issue tracker or the #infinispan channel on Freenode.
as mentioned in our previous post about the new C++/C# release 8.1.0.Beta1, clients are now equipped with near cache support.
The near cache is an additional cache level that keeps the most recently used cache entries in an "in memory" data structure. Near cached objects are synchronized with the remote server value in the background and can be get as fast as a map[] operation.
So, your client tends to periodically focus the operations on a subset of your entries? This feature could be of help: it's easy to use, just enable it and you'll have near cache seamless under the wood.
A C++ example of a cache with near cache configuration
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The last line does the magic, the INVALIDATED mode is the active mode for the near cache (default mode is DISABLED which means no near cache, see Java docs), maxEntries is the maximum number of entries that can be stored nearly. If the near cache is full the oldest entry will be evicted. Set maxEntries=0 for unbounded cache (do you have enough memory?)
Now a full example of application that just does some gets and puts and counts how many of them are served remote and how many are served nearly. As you can see the cache object is an instance of the "well known" RemoteCache class
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Entries values in the near cache are kept aligned with the remote cache state via the events subsystem: if something changes in the server, an update event (modified, expired, removed) is sent to the client that updates the cache accordingly.
By the way: do you know that C++/C# clients can subscribe listener to events? In the next "native" post we will see how.