How can I add values in key value pairs generated in scala

Ch_03 Source

How to add the keys and values seperately from the keys and value pairs generated in spark scala. Ex:(5,1),(6,1),(8,1) and should be (19,3)

    val spark = SparkSession.builder.appName("myapp").getOrCreate()   
val data = spark.read.textFile(args(0)).rdd  
val result = data.map{line => {  
val tokens = line.split("\t")  
(Float.parseFloat(tokens(4)),1)  
    }}  
.reduceByKey( _+ _)
scalaapache-sparkbigdata

Answers

answered 5 days ago Vinod Chandak #1

reduceByKey won't serve your purpose here. Please use foldLeft.

Refer Scala: How to sum a list of tuples for solving your problem.

answered 5 days ago Travis Hegner #2

val spark = SparkSession.builder.appName("myapp").getOrCreate()   
val data = spark.read.textFile(args(0)).rdd  
val result = data.map{line => {  
  val tokens = line.split("\t")  
  (tokens(4).toInt,1)  
}}  
.reduce((l, r) => (l._1+r._1, l._2+r._2))

It's possible that a foldLeft (as suggested by Vinod Chandak) is more appropriate, but I tend to use reduce as I have more experience with it.

answered 5 days ago Shankar Koirala #3

You can use reduce or fold to get the result, You also need to convert the token(4) value to Int or any other Numeric type as you need.

val result = data.map{line => {  
  val tokens = line.split("\t")  
  (tokens(4).toInt,1)  
}} 

Using fold

result.fold((0,0)) { (acc, x) => (acc._1 + x._1, acc._2 + x._2)}

Using reduce

result.reduce((x,y) => (x._1 + y._1, x._2 + y._2)) 

Hope this helps!

comments powered by Disqus