Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

What's the difference between Spark ML and MLLIB packages

I noticed there are two LinearRegressionModel classes in SparkML, one in ML package (spark.ml) and another one in MLLib (spark.mllib) package.

These two are implemented quite differently - e.g. the one from MLLib implements Serializable, while the other one does not.

By the way, the same is true about RandomForestModel or Word2Vec.

Why are there two classes? Which is the "right" one? And is there a way to convert one into another?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

o.a.s.mllib contains old RDD-based API while o.a.s.ml contains new API build around Dataset and ML Pipelines. ml and mllib reached feature parity in 2.0.0 and mllib is slowly being deprecated (this already happened in case of linear regression) and most likely will be removed in the next major release.

So unless your goal is backward compatibility then the "right choice" is o.a.s.ml.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...