本文整理汇总了Scala中org.apache.spark.rdd.HadoopRDD类的典型用法代码示例。如果您正苦于以下问题:Scala HadoopRDD类的具体用法?Scala HadoopRDD怎么用?Scala HadoopRDD使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
在下文中一共展示了HadoopRDD类的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Scala代码示例。
示例1: FPMiningPreprocessingApp
//设置package包名称以及导入依赖的类
package org.apress.prospark
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.mapred.FileSplit
import org.apache.hadoop.mapred.TextInputFormat
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.HadoopRDD
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
import com.google.common.io.Files
object FPMiningPreprocessingApp {
def main(args: Array[String]) {
if (args.length != 3) {
System.err.println(
"Usage: FPMiningPreprocessingApp <appname> <inputpath> <outputpath>")
System.exit(1)
}
val Seq(appName, iPath, oPath) = args.toSeq
val conf = new SparkConf()
.setAppName(appName)
.setJars(SparkContext.jarOfClass(this.getClass).toSeq)
val delim = " "
val sc = new SparkContext(conf)
sc.hadoopFile(iPath, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], sc.defaultMinPartitions)
.asInstanceOf[HadoopRDD[LongWritable, Text]]
.mapPartitionsWithInputSplit((iSplit, iter) =>
iter.map(splitAndLine => (Files.getNameWithoutExtension(iSplit.asInstanceOf[FileSplit].getPath.toString), splitAndLine._2.toString.split(" ")(1))))
.filter(r => r._2 != "0")
.map(r => (r._1, r._2))
.distinct()
.groupByKey()
.map(r => r._2.mkString(" "))
.sample(false, 0.7)
.coalesce(1)
.saveAsTextFile(oPath)
}
}
开发者ID:ZubairNabi,项目名称:prosparkstreaming,代码行数:45,代码来源:L9-13FPMiningPreprocessing.scala
示例2: CollabFilteringPreprocessingApp
//设置package包名称以及导入依赖的类
package org.apress.prospark
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.mapred.FileSplit
import org.apache.hadoop.mapred.TextInputFormat
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.HadoopRDD
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
import com.google.common.io.Files
object CollabFilteringPreprocessingApp {
def main(args: Array[String]) {
if (args.length != 3) {
System.err.println(
"Usage: CollabFilteringPreprocessingApp <appname> <inputpath> <outputpath>")
System.exit(1)
}
val Seq(appName, iPath, oPath) = args.toSeq
val conf = new SparkConf()
.setAppName(appName)
.setJars(SparkContext.jarOfClass(this.getClass).toSeq)
val delim = " "
val sc = new SparkContext(conf)
sc.hadoopFile(iPath, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], sc.defaultMinPartitions)
.asInstanceOf[HadoopRDD[LongWritable, Text]]
.mapPartitionsWithInputSplit((iSplit, iter) =>
iter.map(splitAndLine => (Files.getNameWithoutExtension(iSplit.asInstanceOf[FileSplit].getPath.toString), splitAndLine._2.toString.split(" ")(1))))
.filter(r => r._2 != "0")
.map(r => ((r._1, r._2), 1))
.reduceByKey(_ + _)
.map(r => r._1._1.replace("subject", "") + delim + r._1._2 + delim + r._2)
.sample(false, 0.7)
.coalesce(1)
.saveAsTextFile(oPath)
}
}
开发者ID:ZubairNabi,项目名称:prosparkstreaming,代码行数:44,代码来源:L9-11CollabFilteringPreprocessing.scala
注:本文中的org.apache.spark.rdd.HadoopRDD类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论