Home Forums Kamanja Forums Problems & Solutions R PMML generation currently has a bug for the SVM algorithm

This topic contains 0 replies, has 1 voice, and was last updated by  Greg Makowski 1 year, 2 months ago.

  • Author
    Posts
  • #18737 Reply

    Greg Makowski
    Moderator

    rpmmlsupport@zementis.net

    Hello,

    I wanted to let the Kamanja community know, if you are stuck trying to generate PMML files from R’s Support Vector Machine (SVM) package – it is not just you.  I have also reported the bug.

     

    SUMMARY

    Over many different data sets, and many different SVM Kernels, I keep getting this error message when trying to save a SVM as PMML.

    > model_file <- paste( model_path, “hmeq_R_svm_m1.xml”, sep=””)

    > saveXML( pmml(crs$ksvm), model_file )

    Error in pmml.ksvm(crs$ksvm) : Specified dataset not a legitimate object.

    >

    FULL DETAILS

    * get the abalone data from the UC Irvine ML data repository

    * load it into Rattle, with the following code generated

    # Rattle is Copyright (c) 2006-2015 Togaware Pty Ltd.

    #============================================================

    # Rattle timestamp: 2016-06-30 16:55:11 x86_64-apple-darwin13.4.0

    # Rattle version 4.1.0 user ‘gregmakowski’

    # This log file captures all Rattle interactions as R commands.

    Export this log to a file using the Export button or the Tools

    # menu to save a log of all your activity. This facilitates repeatability. For example, exporting

    # to a file called ‘myrf01.R’ will allow you to type in the R Console

    # the command source(‘myrf01.R’) and so repeat all actions automatically.

    # Generally, you will want to edit the file to suit your needs. You can also directly

    # edit this current log in place to record additional information before exporting.

     

    # Saving and loading projects also retains this log.

    # We begin by loading the required libraries.

    library(rattle)   # To access the weather dataset and utility commands.

    library(magrittr) # For the %>% and %<>% operators.

    # This log generally records the process of building a model. However, with very

    # little effort the log can be used to score a new dataset. The logical variable

    # ‘building’ is used to toggle between generating transformations, as when building

    # a model, and simply using the transformations, as when scoring a dataset.

    building <- TRUE

    scoring  <- ! building

    # A pre-defined value is used to reset the random seed so that results are repeatable.

    crv$seed <- 42

    #============================================================

    # Rattle timestamp: 2016-06-30 16:55:39 x86_64-apple-darwin13.4.0

    # Load the data.

    crs$dataset <- read.csv(“file:///Users/gregmakowski/Documents/KamanjaDemo/data_abalone/abalone.csv”, na.strings=c(“.”, “NA”, “”, “?”), strip.white=TRUE, encoding=”UTF-8″)

    #============================================================

    # Rattle timestamp: 2016-06-30 16:55:39 x86_64-apple-darwin13.4.0

    # Note the user selections.

    # Build the training/validate/test datasets.

    set.seed(crv$seed)

    crs$nobs <- nrow(crs$dataset) # 4177 observations

    crs$sample <- crs$train <- sample(nrow(crs$dataset), 0.7*crs$nobs) # 2923 observations

    crs$validate <- sample(setdiff(seq_len(nrow(crs$dataset)), crs$train), 0.15*crs$nobs) # 626 observations

    crs$test <- setdiff(setdiff(seq_len(nrow(crs$dataset)), crs$train), crs$validate) # 628 observations

    # The following variable selections have been noted.

    crs$input <- c(“sex_Male”, “length”, “diameter”, “height”,

         “wholeWeight”, “ShuckedWeight”, “VisceraWeight”, “ShellWeight”,

         “Rinks”)

    crs$numeric <- c(“sex_Male”, “length”, “diameter”, “height”,

         “wholeWeight”, “ShuckedWeight”, “VisceraWeight”, “ShellWeight”,

         “Rinks”)

    crs$categoric <- NULL

    crs$target  <- “sex”

    crs$risk    <- NULL

    crs$ident   <- NULL

    crs$ignore  <- NULL

    crs$weights <- NULL

    #============================================================

    # Rattle timestamp: 2016-06-30 16:56:04 x86_64-apple-darwin13.4.0

    # Note the user selections.

    # Build the training/validate/test datasets.

    set.seed(crv$seed)

    crs$nobs <- nrow(crs$dataset) # 4177 observations

    crs$sample <- crs$train <- sample(nrow(crs$dataset), 0.7*crs$nobs) # 2923 observations

    crs$validate <- sample(setdiff(seq_len(nrow(crs$dataset)), crs$train), 0.15*crs$nobs) # 626 observations

    crs$test <- setdiff(setdiff(seq_len(nrow(crs$dataset)), crs$train), crs$validate) # 628 observations

    # The following variable selections have been noted.

    crs$input <- c(“length”, “diameter”, “height”, “wholeWeight”,

         “ShuckedWeight”, “VisceraWeight”, “ShellWeight”, “Rinks”)

    crs$numeric <- c(“length”, “diameter”, “height”, “wholeWeight”,

         “ShuckedWeight”, “VisceraWeight”, “ShellWeight”, “Rinks”)

    crs$categoric <- NULL

    crs$target  <- “sex_Male”

    crs$risk    <- NULL

    crs$ident   <- “sex”

    crs$ignore  <- NULL

    crs$weights <- NULL

    #============================================================

    # Rattle timestamp: 2016-06-30 16:56:07 x86_64-apple-darwin13.4.0

    # Decision Tree

    # The ‘rpart’ package provides the ‘rpart’ function.

    library(rpart, quietly=TRUE)

    # Reset the random number seed to obtain the same results each time.

    set.seed(crv$seed)

    # Build the Decision Tree model.

    crs$rpart <- rpart(sex_Male ~ .,

        data=crs$dataset[crs$train, c(crs$input, crs$target)],

        method=”class”,

        parms=list(split=”information”),

        control=rpart.control(usesurrogate=0,

            maxsurrogate=0))

    # Generate a textual view of the Decision Tree model.

    print(crs$rpart)

    printcp(crs$rpart)

    cat(“\n”)

    # Time taken: 0.08 secs

    #============================================================

    # Rattle timestamp: 2016-06-30 16:56:13 x86_64-apple-darwin13.4.0

    # Support vector machine.

    # The ‘kernlab’ package provides the ‘ksvm’ function.

    library(kernlab, quietly=TRUE)

    # Build a Support Vector Machine model.

    set.seed(crv$seed)

    crs$ksvm <- ksvm(as.factor(sex_Male) ~ .,

          data=crs$dataset[crs$train,c(crs$input, crs$target)],

          kernel=”rbfdot”,

          prob.model=TRUE)

    # Generate a textual view of the SVM model.

    crs$ksvm

    # Time taken: 1.24 secs

Reply To: R PMML generation currently has a bug for the SVM algorithm
Your information: