hadoop - Difference between spark Vectors and scala immutable Vector?

Question

Welcome To Ask or Share your Answers For Others

hadoop - Difference between spark Vectors and scala immutable Vector?

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:17:11+0000

spark.mllib.linalg.Vector is designed for linear algebra applications. mllib provides two different implementations - DenseVector, SparseVector. While you have access to useful methods like norm or sqdist it is rather limited otherwise.

As all data structures from org.apache.spark.mllib.linalg it can store only 64-bit floating point numbers (scala.Double).

If you plan to use mllib then spark.mllib.linalg.Vector is pretty much your only option. All the remaining data structures from mllib, both local and distributed, are build on top of org.apache.spark.mllib.linalg.Vector.

Otherwise, scala.immutable.Vector is probably a much better choice. It is a general purpose, dense data structure.

It can store objects of any type, so you can have Vector[String] for example.

Since it is Traversable you have access to all expected methods like map, flatMap, reduce, fold, filter, etc.

Edit: If you need algebraic operations and don't use any of the data structures from org.apache.spark.mllib.linalg.distributed you may prefer breeze.linalg.Vector over spark.mllib.linalg.Vector. It supports larger set of the algebraic methods including dot product and provides typical collection API.

Categories

hadoop - Difference between spark Vectors and scala immutable Vector?

hadoop - Difference between spark Vectors and scala immutable Vector?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags