Home Forums Kamanja Forums Data Science & Models Complex Data Structures as input for Kamanja

This topic contains 5 replies, has 6 voices, and was last updated by  Archived_User57 1 year, 8 months ago.

  • Author
    Posts
  • #13282 Reply

    Archived_User7
    Participant

    Hi,

    We need to digest complex json data structures like this as an input for a specific model:

    {
    “indicators”: [{
    “properties”: [{
    “key”: “something”,
    “value”: “something”
    },
    {
    “key”: “something”,
    “value”: “something”
    }]
    },
    {
    “properties”: [{
    “key”: “something”,
    “value”: “something”
    },
    {
    “key”: “something”,
    “value”: “something”
    }]
    }]
    }

    can we do such a scenario?

  • #13283 Reply

    Archived_User64
    Participant

    this could be especially useful if we could support variable length records in the JSON record structure.

  • #13285 Reply

    Archived_User13
    Participant

    This may be achieved through use of nested containers, however, you’d have to have an ArrayOfArrayOfContainer referenced in the Indicator container and I don’t know if we support that as part of the metadata definition. Tulasi may know more in that regard.

    If we do, though, your containers would look like this:

    Property Container:
    {
    “Container”: {
    “NameSpace”:”System”,
    “Name”:”Property”,
    “Version”:”00.01.00″,
    “Description”:”A Property”,
    “Fixed”:”false”,
    “Elements” : [
    {
    “Field”: {
    “Name”: “Key”,
    “Type”: “System.String”
    }
    },
    {
    “Field”: {
    “Name”: “Value”,
    “Type”: “System.String”
    }
    }
    ]
    }
    }

    Indicator Container:
    {
    “Container”: {
    “NameSpace”:”System”,
    “Name”:”Indicator”,
    “Version”:”00.01.00″,
    “Description”:”An indicator”,
    “Fixed”:”false”,
    “Elements” : [
    {
    “Container”: {
    “Name”: “Properties”,
    “Type”: “System.ArrayOfArrayOfProperty”
    }
    }
    ]
    }
    }

    Message:
    {
    “Message”: {
    “NameSpace”:”System”,
    “Name”:”MessageWithContainers”,
    “Version”:”00.01.00″,
    “Description”:”Message Containing Indicators containing properties”,
    “Fixed”:”false”,
    “Persist”:”true”,
    “Elements” : [
    {
    “Field”: {
    “Name”: “ID”,
    “Type”: “System.String”
    }
    },
    {
    “Container”: {
    “Name”: “Indicators”,
    “Type”: “System.ArrayOfIndicators”
    }
    }
    ],
    “PartitionKey”: [
    “ID”
    ],
    “PrimaryKey”: [
    “ID”
    ]
    }
    }

  • #13288 Reply

    Archived_User19
    Participant

    We really need to support more complex structures given they are being simply translated to underlying scala/java structures by the message compiler. Some of the limitations that we imposed were due to lack of ability to express in PMML, but that should not limit in supporting complex types used by scala/java models.

    Thanks

    -Krishna

  • #13289 Reply

    Archived_User79
    Participant

    At this time we support as William mentioned.

  • #13291 Reply

    Archived_User57
    Participant

    1. When containers are added to the metadata, the metadata api will automatically create types like ArrayOfSomeContainer, ArrayBufferOfSomeContainer, SetOfSomeContainer, QueueOfSomeContainer. I don’t think we are currently generating ArrayOfArrayOfSomeContainer, etc. however. That is a work item in the MetadataAPI or Message compiler (perhaps). This is not a big issue, however, because types can be added to the system.

    2) As far as PMML support for collections, arrays, sets, queues, and even maps have been in use for a long time. The medical copd example that is written in the kamanja pmml produces and emits a map of costs by date (a Map[int,Double] as I recall) for inpatient and outpatient medical records.This works because the engine takes the result and forms a json string from it. I suspect that we don’t support the Map yet in the native form that would be needed to be used in an output message. I think that this kind of thing will ultimately be essential for any aggregated output from the RDD’s groupBy. An aggregation that used two group by keys would be represented as Map[K,Map[L,V] for example.

Reply To: Complex Data Structures as input for Kamanja
Your information: