Home Forums Kamanja Forums Data Science & Models Migrate PMML models to JAVA/SCALA models

This topic contains 10 replies, has 7 voices, and was last updated by  Archived_User28 2 years, 2 months ago.

  • Author
  • #13311 Reply


    Hi Team,

    With the introduction of a fully fledged Java/Scala API with Kamanja release 1.1, can you recommend if we should plan to migrate existing PMML models to this API ?

    Also couple of related questions

      Is this migration process going to be essentially rewriting the PMML model in Scala/Java ?


      Do we have any guidelines on this migration ?


  • #13312 Reply


    I would definitely recommend migrating all PMML models to Java / Scala model.. so much easier to maintain, as well as enhance.

    Here is the documentation on the same – http://kamanja.org/advanced-developers-guide-v1-1/#Writing_a_Customer_Model_in_Java_and_Scala

    Please review and let me know if you have any questions. WIll be happy to help.

  • #13313 Reply



    I don’t think there is a need for migration as the new version supports Java/Scala along with PMML.

    We just need to consider which code would be easy to maintain and support for the new development.


  • #13314 Reply


    Thank you for your quick replies. There are two schools of thought here, so will await further responses from others to hear their opinions.

    • #13315 Reply


      I highly recommend to switch to scala/java (scala would be little compact than java) to reduce maintenance as well to write faster performed models due to use of better programming constructs and ability to use many of functions from other libraries.


  • #13316 Reply


    The benefit of PMML is when used in conjunction with tools that generate PMML. This is good for those who are uncomfortable diving into code. However, if a person or organization is familiar with java/scala, I would recommend, like the other have suggested, that you, if not migrate, at least consider future model development in java/scala.

    As to the questions you asked:

    Is this migration process going to be essentially rewriting the PMML model in Scala/Java?

    Essentially, yes, this would mean rewriting the model in java or scala. It is possible that you may take the scala code that was generated when you submitted the PMML model definition to Metadata but I wouldn’t recommend it as the code wouldn’t be nearly as efficient as if you were to write it yourself.

    Do we have any guidelines on this migration?

    Other than general coding guidelines for best practices, I wouldn’t say there is are any particular guidelines regarding migration (as we don’t actually have a migration process). Perhaps Krishna or Pokuri could answer that better.

    • #13317 Reply


      I an slightly confused here – I get the benefits of Scala/Java over PMML, but are we saying that we HAVE to migrate PMML models to Scala/Java? I was under an impression that the new release is back word compatible and should support existing PMML models – is that not true?

      • #13318 Reply


        We still fully support PMML models. I’m sorry for any confusion but don’t worry. We didn’t cut any support at all. We’ve only enhanced.

      • #13319 Reply


        Nope – you don’t HAVE to migrate..

        all that we are saying is – it is RECOMMENDED that you migrate – you can choose not to migrate and keep the PMML model itself..

        • #13320 Reply


          When you train a model in R, KNIME or SAS Enterprise Miner, any “model maintenance” or updating is done with the respective data mining package. This would be done by extracting fresher data records and retraining various models, later picking the best. After the new model is picked, the PMML is generated for that model. There is no PMML editing in this process, only replacement. This is analogous to not editing a JAR file, but edit the Java source code, generate the JAR and replace the JAR file.

          Running a prior analytic department with 30+ models in PMML, the only PMML I (or anybody) edited directly was simple logic around routing records to the correct model. When I previously used the Zementis Adapa PMML server, they allowed PMML files to be included. So I would only edit the top level PMML transaction routing file that includes all the other models. I strongly recommend that we get to a stage where we can include other PMML files for a similar production scenario, to better manage QA regression or integration testing to just the files edited or regenerated. Alternatively, the record routing logic could be in Java or Scala, as long as the PMML models can be called. It is something that would be frequently edited to support frequent model updates, so we may want an abstraction, to an external text parameter file, indicating the condition, any random sampling roll out, and the model names used. Then through a GUI or editing the text file, changes could be made more easily.

          IF (transaction is to or from CITIBANK) THEN
          send the transaction to be scored to the CITIBANK model

          IF (transaction is to or from HSBC) THEN {
          IF (rand() < 0.99) THEN send the transaction to be scored to the HSBC model
          ELSE send the transaction to be scored to the HSBC_REPLACEMENT model

          I support Heman’s point. If manually generated PMML (NOT from R, SAS EM, KNIME or other) is to be maintained, and it is not short, simple control logic, a reasonable case can be made for software lifecycle maintenance to maintain custom code in Java or Scala. However, if human generated PMML code is working today, there is choice on how soon the human generated PMML gets converted.

        • #13321 Reply


          As they suggest, the principal use of the PMML is when the model or models (a so-called ensemble) is/are created by an authoring tool like R/Rattle or SAS Enterperise Miner, etc. It has also been used as a target language for a DSL developed by one of our customers. My take is you should do the right thing that makes sense to support your customer’s objective.

          As to performance, as the models become more complex and as the cluster becomes loaded with 100s / 1000s of models, there will be a couple of things that we will want to do for the models, regardless of their type (custom scala/java, pmml, jpmml). A couple of points should be kept in mind for any migration or new development effort:

          1) Model “idempotency” – make sure that your model, when instantiated by the engine in response to some incoming message, can be reused for subsequent messages of the same type. This is particularly important for models that retrieve significant information from the kv store (or any external storage for that matter) for their model in order to filter/interpret the message incoming. It is relatively expensive to marshal information to/from the disk. That information should be cached and part of the steady state that the model has when receiving the model, keeping it as close to the message processing as possible.

          2) Managing model complexity using ensembles. Significant models can/should/will be managed with a group of cooperating models aka an ensemble. Kamanja will soon base its execution on a DAG (directed acyclic graph) that allows the models to compute values that are consumed by other models. Values so created are called derived concepts. These concepts will be available by model namespace/model name/concept name (the key) from the kv store available to all models. Models that need some derivation based upon an incoming message will wait for their partner model to complete the derivation of that concept before executing. Multiple derived concepts could be required by any given model. The partner models deriving these values, whenever they DO NOT depend upon each other, can and will be scheduled to run concurrently by the Kamanja cluster’s engines.

          The key to building the ensemble style model is the use of the input and output dependency specifications in the metadata. The pmml based models generate this information for you. The custom models will for the time being at least have to be specified in a file. There will be more details coming out about this in one of the sprints in the near future.

          One final thing. As envisioned, ensemble models could be comprised of a mixture of different model types (Scala, Java, PMML, JPPML). For example, a Random Forest model might be comprised of a set of “weak learner” models developed with SAS EM and a custom Scala model that consumes the respective results that uses highly customized classifier or regression model that serves as the “strong learner” for the ensemble. When the ensemble has hundreds of models in it, it is highly likely that many of these will not be written with custom code. The custom code would be strategically used for the interpretation of the others’ results.

Reply To: Migrate PMML models to JAVA/SCALA models
Your information: