How can I add values in key value pairs generated in scala

Ch_03 Source

How to add the keys and values separately from the keys and value pairs generated in spark scala?

Given the following input

(5,1),(6,1),(8,1)

I'd like to get to the following output

(19,3)

This is what I've tried so far:

val spark = SparkSession.builder.appName("myapp").getOrCreate()   
val data = spark.read.textFile(args(0)).rdd  
val result =
  data.map { line => {  
    val tokens = line.split("\t")  
    (Float.parseFloat(tokens(4)),1)  
  }}.
  reduceByKey( _+ _)
scalaapache-sparkbigdata

Answers

answered 2 months ago Vinod Chandak #1

reduceByKey won't serve your purpose here. Please use foldLeft.

Refer Scala: How to sum a list of tuples for solving your problem.

answered 2 months ago Travis Hegner #2

val spark = SparkSession.builder.appName("myapp").getOrCreate()   
val data = spark.read.textFile(args(0)).rdd  
val result = data.map{line => {  
  val tokens = line.split("\t")  
  (tokens(4).toInt,1)  
}}  
.reduce((l, r) => (l._1+r._1, l._2+r._2))

It's possible that a foldLeft (as suggested by Vinod Chandak) is more appropriate, but I tend to use reduce as I have more experience with it.

answered 2 months ago Shankar Koirala #3

You can use reduce or fold to get the result, You also need to convert the token(4) value to Int or any other Numeric type as you need.

val result = data.map{line => {  
  val tokens = line.split("\t")  
  (tokens(4).toInt,1)  
}} 

Using fold

result.fold((0,0)) { (acc, x) => (acc._1 + x._1, acc._2 + x._2)}

Using reduce

result.reduce((x,y) => (x._1 + y._1, x._2 + y._2)) 

Hope this helps!

comments powered by Disqus