June 22, 2016 at 10:22 pm #18245
Can you please let me know your thoughts on using MemCache DB or PostgreSQL as KV stores for Kamanja use cases that have very high message volume? I understand we will possibly need to build the storage adapters, but need to get a view from performance standpoint.
June 23, 2016 at 11:59 am #18262
As of version 1.4, we have tenant-specific storage configuration in the Cluster Configuration. This tenant-specific storage will hold all message history (if messages are marked as persist) and containers associated with that tenant.
What doesn’t exist is the ability to separate out message history and KV Stores into separate data stores through additional storage configuration as well as support for MemCacheDB and PostgreSQL.
As far as thoughts on MemCacheDB and PostgreSQL specifically for this purpose, I believe there are probably better options out there. If you’re looking for a cached (in-memory) solution, MemCacheDB, despite its name, isn’t a cache solution (see memcachedb.org to see it in their own words), it’s a persistent KV database much like HBase, Cassandra, etc. so if it’s just a faster database you’re looking for, I don’t know the performance numbers compared to other similar databases.
PostgreSQL would be too slow entirely. It’s a relational database and you’re looking for a high-volume key-value solution, so it doesn’t make since for this particular use-case. Perhaps if you had static lookup tables that rarely or never changed, it would be better.
Enhancements could be made to allow more granular configuration of tenant storage to allow a dedicated storage for KV stores but if you’re looking for fastest solution, I think some manner of distributed cache would be more appropriate with a system for periodic updates to a persistent datastore in the event an entire cluster goes down.
June 29, 2016 at 8:38 am #18515
The primary intent behind the ask is to get a KV store that is available as a standalone installation – for us, HBase comes with Cloudera installation. This is totally specific to our organization. Cassandra is not approved within our organization, whereas MemCache and PostgreSQL are.
We are not storing all the input messages in the persistent storage, but only a fraction of them which are to be used as reference data for the rest of the messages – which means we are looking at about 100k records in the persistent storage. Also the models use the entire ref data for each message, so we may not have any advantage in using a columnar KV store. Based on this information, please let me know your thoughts on using any other data store.
Regarding an in-memory store – do you think there is any advantage in using in-memory data grid like Apache Ignite?
June 29, 2016 at 4:18 pm #18592
I can see value in something like Apache Ignite that would allow for quick access of your KV Stores. The number of records in persistent storage isn’t large, of course, but querying that much info with each execution would be quite heavy. It’s obviously faster to have everything in memory but keeping 100k records in memory would be difficult depending on their size.
I would say that if the number of records is fairly static (i.e. more of a lookup table than constantly changing data), it’d be fine keeping it in something like Apache Ignite (assuming those 100k records aren’t particularly large or you have a beast of a server).
Ideally, a frequently-used subset of records would be held in-memory while less often used data could be fetched directly from persistent storage but we might not have much control over what, specifically, was held onto beyond max memory or N number of records.
Regardless of which storage you’d ultimately choose and their individual advantages, I think in order to support a stand-alone KV store, some changes would need to be made to allow for configuration of a separate store exclusively for KV (or metadata, or whatever else you might want).
I’m not sure if that’s quite what you’re looking for but I can talk to Pokuri and Krishna about this.