Parse multiple JSON from a RDD in spark streaming using SCALA

ibh Source

I have a scenario where we receive JSON objects into a Kafka topic. Window duration is 30 seconds. During this 30 seconds atleast 3 objects flow into the topic and is created as RDD. Each JSON object has 3 strings. Output should display like below.

id   time        json            count
1  10-JUL-2018  1st_json_object   3
2  11-JUL-2018  2nd_json_object   3
3  12-JUL-2018  3rd_json_onject   3

I am not able to identify the individual JSON object, because of which json for each id is showing all 3 JSON objects, and count as 9.

Is there a way I can identify individual JSON object and fix the json and count columns.

Sample JSON that I see in an RDD:

  [{"header":[{"v":"120.6.3.5","n":"host"},{"v":"123","n":"id"},{"v":"2016-08-24","n":"est"}]}]
[{"header":[{"v":"120.3.4.2","n":"host"},{"v":"456","n":"id"},{"v":"2016-08-24","n":"est"}]}]
jsonscalaapache-spark

Answers

comments powered by Disqus