• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

Scala KMeansModel类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Scala中org.apache.spark.mllib.clustering.KMeansModel的典型用法代码示例。如果您正苦于以下问题:Scala KMeansModel类的具体用法?Scala KMeansModel怎么用?Scala KMeansModel使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



在下文中一共展示了KMeansModel类的3个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Scala代码示例。

示例1: KmeansExample

//设置package包名称以及导入依赖的类
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors

import scala.util.Try



object KmeansExample {

  import ApplicationContext._
  def main(args: Array[String]) {

    val pressureRead = sc.textFile("src/main/resources/PamarcoPressure.txt")
    val vibrationText = sc.textFile("src/main/resources/PamarcoVibration.txt")

    val pressureRDD = pressureRead.map(_.split(","))
    val vibrationRDD = vibrationText.map(_.split("\t")).persist()

    val vibrationVector = vibrationRDD.map { row =>
      Try(Vectors.dense(row(1).toDouble, row(2).toDouble, row(3).toDouble)).toOption
    }.filter(_.isDefined).map(_.get)
    val splittedRDD = vibrationVector.randomSplit(Array(0.6, 0.4))
    val trainRDD = splittedRDD(0)
    val testRDD = splittedRDD(1)
    // Cluster the data into two classes using KMeans
    val numClusters = 2
    val numIterations = 20
    val clusters = KMeans.train(trainRDD, numClusters, numIterations)

    // Evaluate clustering by computing Within Set Sum of Squared Errors
    val WSSSE = clusters.computeCost(vibrationVector)
    println("Within Set Sum of Squared Errors = " + WSSSE)

    // Save and load model
    clusters.save(sc, "target/org/apache/spark/KMeansExample/KMeansModel")
    val sameModel = KMeansModel.load(sc, "target/org/apache/spark/KMeansExample/KMeansModel")

    import spark.implicits._
    val foo = sameModel.predict(testRDD)
    foo.toDF.show


    sc.stop()
  }
} 
开发者ID:shiv4nsh,项目名称:spark-kmeans-timeseries-data-example,代码行数:46,代码来源:KmeansExample.scala


示例2: Predict

//设置package包名称以及导入依赖的类
package com.databricks.apps.twitterClassifier

import org.apache.spark.SparkContext
import org.apache.spark.mllib.clustering.KMeansModel
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.{Seconds, StreamingContext}

object Predict extends App {
  import SparkSetup._

  val options = PredictOptions.parse(args)
  val ssc = new StreamingContext(sc, Seconds(options.intervalInSecs))
  Predictor.doIt(options, sc, ssc)
}


object Predictor {
  def doIt(options: PredictOptions, sc: SparkContext, ssc: StreamingContext) {
    println("Initializing the the KMeans model...")
    val model: KMeansModel = new KMeansModel(sc.objectFile[Vector](options.modelDirectory.getCanonicalPath).collect)

    println("Materializing Twitter stream...")
    TwitterUtils.createStream(ssc, maybeTwitterAuth)
      .map(_.getText)
      .foreachRDD { rdd =>
        rdd.filter(t => model.predict(featurize(t)) == options.clusterNumber)
           .foreach(print)  // register DStream as an output stream and materialize it
      }
    println("Initialization complete, starting streaming computation.")
    ssc.start()
    ssc.awaitTermination()
  }
} 
开发者ID:krish121,项目名称:Spark-reference-applications,代码行数:35,代码来源:Predict.scala


示例3: KMeansCases

//设置package包名称以及导入依赖的类
import org.apache.spark.SparkContext
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
import org.apache.log4j.{Level, Logger}

class KMeansCases(sc: SparkContext, dataFile: String, numOfCenters: Int, maxIterations:Int) {
  //hide logger from console
  Logger.getLogger("org").setLevel(Level.OFF)
  Logger.getLogger("akka").setLevel(Level.OFF)

  val data = sc.textFile(dataFile)
  val parsedData = data.map(s => Vectors.dense(s.split('\t').map(_.toDouble))).cache()

  def KMeansInitialCenters() = {
    val initStartTime = System.nanoTime()
    val centers = new KMeansInitialization().run(sc, dataFile, numOfCenters)
    val initTimeInSeconds = (System.nanoTime() - initStartTime) / 1e9
    println(s"Initialization to find centers took " + "%.3f".format(initTimeInSeconds) + " seconds.")

    val initStartTime1 = System.nanoTime()
    val model = new KMeansModel(centers)
    val clusterModel = new KMeans().setK(numOfCenters).setMaxIterations(maxIterations).setInitialModel(model).run(parsedData)
    val initTimeInSeconds1 = (System.nanoTime() - initStartTime1) / 1e9
    println(s"Initialization with custom took " + "%.3f".format(initTimeInSeconds1) + " seconds.")

    println("\nnumber of points per cluster")
    clusterModel.predict(parsedData).map(x=>(x,1)).reduceByKey((a,b)=>a+b).foreach(x=>println(x._2))

  }

  def KMeansParallel() = {
    val initStartTime = System.nanoTime()

    val clusterModel = KMeans.train(parsedData, numOfCenters, maxIterations, 1, KMeans.K_MEANS_PARALLEL)
    val initTimeInSeconds = (System.nanoTime() - initStartTime) / 1e9
    println(s"Initialization with KMeansParaller took " + "%.3f".format(initTimeInSeconds) + " seconds.")
    println("number of points per cluster")
    clusterModel.predict(parsedData).map(x=>(x,1)).reduceByKey((a,b)=>a+b).foreach(x=>println(x._2))
  }

  def KMeansRandom() = {
    val initStartTime = System.nanoTime()

    val clusterModel = KMeans.train(parsedData, numOfCenters, maxIterations, 1, KMeans.RANDOM)
    val initTimeInSeconds = (System.nanoTime() - initStartTime) / 1e9
    println(s"Initialization with KMeasRandom took " + "%.3f".format(initTimeInSeconds) + " seconds.")
    println("number of points per cluster")

    clusterModel.predict(parsedData).map(x=>(x,1)).reduceByKey((a,b)=>a+b).foreach(x=>println(x._2))
  }
} 
开发者ID:AndyFou,项目名称:kmeans_contributions,代码行数:52,代码来源:KMeansCases.scala



注:本文中的org.apache.spark.mllib.clustering.KMeansModel类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Scala StandardScaler类代码示例发布时间:2022-05-23
下一篇:
Scala JarEntry类代码示例发布时间:2022-05-23
热门推荐
热门话题
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap