Parse multiple JSON from a RDD in spark streaming using SCALA

ibh Source

I have a scenario where we receive JSON objects into a Kafka topic. Window duration is 30 seconds. During this 30 seconds atleast 3 objects flow into the topic and is created as RDD. Each JSON object has 3 strings. Output should display like below.

id   time        json            count
1  10-JUL-2018  1st_json_object   3
2  11-JUL-2018  2nd_json_object   3
3  12-JUL-2018  3rd_json_onject   3

I am not able to identify the individual JSON object, because of which json for each id is showing all 3 JSON objects, and count as 9.

Is there a way I can identify individual JSON object and fix the json and count columns.

Sample JSON that I see in an RDD:



comments powered by Disqus