Home Forums Kamanja Forums Problems & Solutions Storing 20 Million rows in Container

This topic contains 2 replies, has 1 voice, and was last updated by  JP Creedon 1 year, 5 months ago.

  • Author
    Posts
  • #13794 Reply

    Navjot

    Problem Statement: We’ve a use case where we need to do analysis of Customer transactions. For this Customer List will be our driver(i.e. Model should be triggered only for the customers in focus) and Customer Details (Demographic data) and previous transactions will be lookup data here.
    We’re planning to load Customer data and previous transactions into Container but the data volumes here are very high.
    Rough Estimates: 20 Million for Customer details
    60 Million transactions (Planning to partition these by Month and then making different container for each month)

    Please suggest the most optimum way to achieve this as I am worried that this much amount of data can slog containers.

  • #14093 Reply

    JP Creedon
    Member

    Hi Navjot,

    We are putting together a response to your question and will post it when it is ready.

    Regards,
    JP

    • This reply was modified 1 year, 5 months ago by  JP Creedon.
    • #14150 Reply

      JP Creedon
      Member

      Thanks for your patience Navjot.

      The key to scale is to define proper partitioning keys on containers and messages – in this case, it could be the customer id or account number or something that uniquely identifies a customer. In addition to this, transactions should be defined as messages and partitioned by both time and customer id. In Kamanja version 1.3.3, the container/messages need to be marked for cache, to keep them in memory cache. The current limitation is that once marked for cache, the entire container gets cached in memory and for very large containers (transactions) memory size could be problem. In upcoming release 1.4.0 (coming soon), the cache is handled in normal fashion, where unused or least used entries will be removed from cache to manage configured cache limits on memory utilization.

      • This reply was modified 1 year, 5 months ago by  JP Creedon.
Reply To: Storing 20 Million rows in Container
Your information: