In Apache Spark Scala, how to fill Vectors.dense in DataFrame from CSV?

user3676943 Source

world,

I am new to spark.

I noticed this online example:

http://spark.apache.org/docs/latest/ml-pipeline.html

I am curious about this syntax:

// Prepare training data from a list of (label, features) tuples.
val training = spark.createDataFrame(Seq(
  (1.0, Vectors.dense(0.0, 1.1, 0.1)),
  (0.0, Vectors.dense(2.0, 1.0, -1.0)),
  (0.0, Vectors.dense(2.0, 1.3, 1.0)),
  (1.0, Vectors.dense(0.0, 1.2, -0.5))
)).toDF("label", "features")

Is it possible to replace the above call to some syntax which reads values from CSV?

I want something comparable to Python-Pandas read_csv() method.

scalacsvapache-spark

Answers

answered 2 years ago Raphael Roth #1

The answer: Yes, it is possible

If the CSV is on HDFS, you can use spark-csv to read it: example, or of its on the normal filesystem you can just read if with plain scala : example

comments powered by Disqus