This topic contains 5 replies, has 3 voices, and was last updated by Sreenivasulu Pokuri 1 year, 11 months ago.
June 3, 2016 at 6:02 am #17372
I had discussion with Krishna a few weeks back on the YARN integration of Kamanja. Posting the summary of our discussion here – please review and let me know your thoughts and suggestions.
1. For Yarn integration, a specific application master can be written that will coordinate with Yarn resource manager and node manager. This application master will launch each Kamanja node in a separate container. It needs to be proven that these containers will facilitate intercommunication between Kamanja nodes and Kamanja leader in appropriate fashion.
2. Two approaches to manage resource scaling – horizontally (by increasing/reducing number of Kamanja nodes) or vertically (by reducing memory/CPU footprint).
3. Preferable approach is vertical management as it is potentially cleaner than the second.
a. CPUs can be freed/consumed by reducing/increasing number of threads on demand. This approach will need some changes in the way Kamanja creates threads are created –currently is it tightly coupled with the Kafka topic partitions, which will have to be made logical partitioning based on configuration. The thread context save will also need to be managed. This functionality is in Kamanja roadmap.
b. For memory – we can free up cache objects and signal memory reduction to JVM, but it needs to be investigated how fast the JVM honors the request. [Afterthought: If the cache objects are removed, will the engine be able to perform properly?]
4. Horizontal management is slightly easier to implement.
a. The Kamanja nodes that are spawned in Yarn containers can be shut down/ increased as per availability/demand from resource manager.
b. This process is more abrasive as the entire Kamanja node in the container will be shut down – all the inflight messages will have to be either completely processed or handled by the new Kamanja node that is going to handle. There will also be up time/down time involved with each scaling operation.
June 7, 2016 at 9:10 am #17416
Thanks for your patience Koushik the response required some debate and deep thought, a response will be posted soon.
June 8, 2016 at 1:04 am #17466
# We also need to consider the following things
1. How often do you create/destroy YARN containers? This has two issues.
a. If we create and close too often which impact Kamanja performance. Because Kamanja redistribute its workload when ever node changes.
b. If we are not destroying each container very long time, which impact other applications/tenants.
# This also gives us the following abilities
1. To control resources by tenant with in Kamanja.
2. Monitor resources with standard Hadoop tools.
I will think some more and post a bit later.
June 10, 2016 at 3:36 am #17520
The scaling might not be very frequent – we can probably assume once a day to take advantage of the business hours/off hours. This is primarily for the use cases which do not require quick turn around – use cases requiring quick turn around may be setup on dedicated cluster.
If the containers are not destroyed for long time, wouldn’t those keep on performing BAU within the container boundaries and not impact other applications?
June 16, 2016 at 3:02 am #17745
One problem in the above approach is that Kamanja needs predefined node information in ClusterConfig.json – if the physical nodes were to be replaced by YARN containers, can we really define the node information in advance? Will this need any changes in the Kamanja engine?
June 19, 2016 at 2:58 pm #17927
That is why we need to balance in destroying Yarn Containers. If we are destroying for every operation that will waste time to recreate. If we are not destroyed long time, that will impact other applications (and wasting resources). So, we need to see how can we balance it.
And when we move to Yarn containers, we may not have any more physical nodes. But preferably we need to have at lest one leader always. So, we need to change all Yarn containers as nodes, but we need to have at least one yarn container to get leader for work distribution. Or we may need to change the work load distribution for Yarn containers instead of using the current approach.