Home Forums Kamanja Forums Data Science & Models question about rddObject.getRDD();

This topic contains 6 replies, has 3 voices, and was last updated by  Archived_User57 1 year, 8 months ago.

  • Author
    Posts
  • #13130 Reply

    Archived_User57
    Participant

    Hi,

    I am trying to use rddObject.getRDD(); in Java model to get all the records in the following containers , I checked the size of the rdd object and it is always equal to 1 .and if I print zipCode field I am getting the value from last record.

    how can I get all records ?

    JavaRDD<UserLocaton> userLocation = UserLocatonFactory.rddObject.getRDD();

    System.out.println(“userLocation.count()——>” +userLocation.count());
    for (Iterator<UserLocaton> userloc = userLocation.iterator(); userloc.hasNext();) {
    UserLocaton ul = userloc.next();
    appRareness = ul.rareness();
    System.out.println(ul.rareness()+”————–Rareness—————–“);
    System.out.println(ul.userid()+”————–userId—————–“);
    System.out.println(ul.zipcode()+”————–zip—————–“);
    }

    output ::

    userLocation.count()——>1
    0.66————–Rareness—————–
    1————–userId—————–
    241————–zip—————–

    this is the csv file

    userId,zipcode,frequency,normal,rareness
    1,61492,0.01,0.03,0.97
    1,00005,0.03,0.10,0.90
    1,00010,0.09,0.31,0.69
    1,00023,0.29,1.00,0.00
    1,00025,0.10,0.34,0.66
    1,00040,0.05,0.17,0.83
    1,00141,0.08,0.28,0.72
    1,00145,0.10,0.34,0.66
    1,00166,0.15,0.52,0.48
    1,00241,0.10,0.34,0.66

    UserLocation.json :

    {
    “Container”: {
    “NameSpace”: “System”,
    “Name”: “UserLocaton”,
    “Version”: “1.00”,
    “Description”: “UserLocation”,
    “Fixed”: “true”,
    “Elements”: [
    {
    “Field”: {
    “NameSpace”: “System”,
    “Name”: “userId”,
    “Type”: “System.string”
    }
    },
    {
    “Field”: {
    “NameSpace”: “System”,
    “Name”: “zipCode”,
    “Type”: “System.int”
    }
    },
    {
    “Field”: {
    “NameSpace”: “System”,
    “Name”: “frequency”,
    “Type”: “System.double”
    }
    },
    {
    “Field”: {
    “NameSpace”: “System”,
    “Name”: “normal”,
    “Type”: “System.double”
    }
    },
    {
    “Field”: {
    “NameSpace”: “System”,
    “Name”: “rareness”,
    “Type”: “System.double”
    }
    }
    ],
    “PartitionKey”: [
    “userId”
    ]
    }
    }

  • #13131 Reply

    Archived_User7
    Participant

    It looks like part key is acting as prim key which allows only one record per key. Call pokuri to see what the issue is as it is expected to hold multiple records per part key.

  • #13133 Reply

    Archived_User28
    Participant

    Hi Ahmad,

    I have two things here.

    1. You are not persisting the data. So, at the most you will get only the current key always. If you want to get all the keys, mark the message/container with persists flag.

    2. If you want to get all users data you can simply use getRDD. But if you want to get all records for this user, you should pass the current user key, just use getRDDForCurrKey or getRDD with the current message/container partition key. May be you need to provide some other information like date range etc.

    Thanks

    Pokuri

    From: Krishna Uppala [mailto:kri…@ligadata.com]
    Sent: Saturday, August 29, 2015 12:33 PM
    To: Ahmad Radwan
    Cc: kamanja
    Subject: Re: question about rddObject.getRDD();

  • #13135 Reply

    Archived_User57
    Participant

    Pokuri,

    I tried to set persists flag to the container but I am still getting the same result.

    I added one more record to the csv file with new userId (partitionkey) then the count for rdd object became 2 , so it seems that only one record is saved per user.

    Thanks,
    Ahmed

  • #13136 Reply

    Archived_User28
    Participant

    Hi Ahmad,

    Are you loading this from KVInit? If so, I don’t think we are handling multiple values storing for save partition key in KVInit. I was thinking you are sending them as message and saving them as container. I will open an issue for this.

    Thanks

    Pokuri

  • #13138 Reply

    Archived_User57
    Participant

    Thanks Pokuri ,

  • #13139 Reply

    Archived_User57
    Participant

    This is the link for tracking the issue https://github.com/ligaDATA/Kamanja/issues/697

Reply To: question about rddObject.getRDD();
Your information: