Kamanja makes it easy to create, run, and continuously enhance models, potentially thousands of them, that are applied against every new data event within an enterprise. Models can range from simple rules-based decision trees written in Java to sophisticated non-linear classifiers implemented in PMML. Models can continuously leverage the most recent and all past data to make arbitrarily complex decisions at any given moment. This process is called continuous decisioning.
With its modular design and easy integration with other open-source and proprietary data technologies, Kamanja solves challenging problems in finance, medicine, insurance, security, communications, and any area where vast amounts of data must be processed robustly in real-time. Simply by adding nodes, Kamanja scales to meet virtually any volume of data or number and complexity of models. As Kamanja is open-source, support and contributions are crowd-sourced by an ever-growing, global community of developers and users.
The Kamanja framework is open-source with many features that support use cases:
- Built bottom-up in Scala - a concise, powerful, language.
- Compiles predictive models; saves as PMML into Scala and JAR files, then saves into data store. Moves complex analytic models into production from weeks to days.
- Performs real-time message processing (that is, from Kafka), real-time scoring with PMML models, and feeds outputs to application or messaging systems. It provides the benefits of both Apache Storm’s stateless, event streaming, and Apache Spark™ Streaming’s stateful, distributed processing.
- Delivers performance and scale. Delivers 100% more message throughput than Storm with DAG execution of PMML models on a single node. Runs hundreds of models concurrently on the same node.
- Runs event processing one-time only by enforcing concurrency and frequency. Eliminates errors and false alerts from running the same transactions at once or running transactions repeatedly.
- Provides fault tolerance. Uses Kafka for messaging and Apache ZooKeeper™ to provide fault tolerance, processor isolation, cluster coordination service, and resource management.
- Handles complex record routing in large parallelized implementations.
- Finds and fixes problems fast with DevOps. Views service logs to troubleshoot.
- Provides pluggable APIs that enable Kamanja to run with other messaging systems and execution environments.
- Provides analytics. Kamanja exposes services and data for analytic, reporting, and scheduling tools.
- Supports other high-level languages and abstractions for implementation (PMML, Cassandra, Hbase, DSL).
- Quickly deploys on multi-node clusters. Builds, ships, and runs applications with minimal downtime.
- Supports a vibrant community. Kamanja is easy to deploy, has test samples, and developer guides. Community forums answer questions and allow users to make contributions.
The diagram above illustrates the basic elements of Kamanja, including its inputs and outputs.
Models – represent the business logic, the “rules” that make the decisions within the Kamanja engine. Models can be implemented in Java, Scala, or PMML. Upcoming versions of Kamanja will allow models to be generated with R, Python, and other data analysis tools. Shared models and standard templates allow models to be rapidly created, and they are easily deployed and managed as metadata within Kamanja.
Metadata – maintained and managed within Kamanja. Metadata consists of all the information that the engine needs to process incoming messages, including the models, information about message types, and configuration information.
Input/Sources – a variety of sources can serve as input to the Kamanja engine. The most common source today is Kafka, an open-source messaging system. Kafka takes incoming messages and distributes them on publish-subscribe queues, from which Kamanja fetches the data and feeds it into the engine. Other sources of data, including databases and Hadoop clusters, are supported through simple input adapters.
Input Adapter – a Java class that converts input data in some form into a serial stream that can be used by the Kamanja engine. Kamanja can recognize incoming messages of a variety of types, including JSON, XML, MQ, and more. Messages are passed from the input adapter into the Kamanja engine, where deployed models are applied.
Data Stores – models often require previous data in order to process new messages. Kamanja supports access to several data stores, including HBase (Hadoop) and Cassandra, which may be deployed as part of or external to the Kamanja cluster itself.
Output Adapter – takes results from a model then formats and distributes them as messages that may be processed by a consumer. In most implementations today, output messages are distributed through Kafka queues, but this flexible design allows Kamanja to integrate with virtually any downstream application or service.
Output/Consumers – take any number of actions based on messages received from Kamanja, including presenting them on a dashboard, generating alerts or triggers, passing them on to other applications for further processing, or simply storing them in a database for further review or analysis.