spark.mllib.linalg.Vector
is designed for linear algebra applications. mllib
provides two different implementations - DenseVector
, SparseVector
. While you have access to useful methods like norm
or sqdist
it is rather limited otherwise.
As all data structures from org.apache.spark.mllib.linalg
it can store only 64-bit floating point numbers (scala.Double
).
If you plan to use mllib
then spark.mllib.linalg.Vector
is pretty much your only option. All the remaining data structures from mllib
, both local and distributed, are build on top of org.apache.spark.mllib.linalg.Vector
.
Otherwise, scala.immutable.Vector
is probably a much better choice. It is a general purpose, dense data structure.
It can store objects of any type, so you can have Vector[String]
for example.
Since it is Traversable
you have access to all expected methods like map
, flatMap
, reduce
, fold
, filter
, etc.
Edit: If you need algebraic operations and don't use any of the data structures from org.apache.spark.mllib.linalg.distributed
you may prefer breeze.linalg.Vector
over spark.mllib.linalg.Vector
. It supports larger set of the algebraic methods including dot
product and provides typical collection API.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…