本文整理汇总了Java中org.apache.crunch.MapFn类的典型用法代码示例。如果您正苦于以下问题:Java MapFn类的具体用法?Java MapFn怎么用?Java MapFn使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
MapFn类属于org.apache.crunch包,在下文中一共展示了MapFn类的13个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: apply
import org.apache.crunch.MapFn; //导入依赖的package包/类
public <T> PCollection<Pair<Integer, T>> apply(PCollection<T> pcollect) {
PTypeFamily ptf = pcollect.getTypeFamily();
PType<Pair<Integer, T>> pt = ptf.pairs(ptf.ints(), pcollect.getPType());
return pcollect.parallelDo("crossfold", new MapFn<T, Pair<Integer, T>>() {
private transient RandomGenerator rand;
@Override
public void initialize() {
if (rand == null) {
this.rand = RandomManager.getSeededRandom(seed);
}
}
@Override
public Pair<Integer, T> map(T t) {
return Pair.of(rand.nextInt(numFolds), t);
}
}, pt);
}
开发者ID:apsaltis,项目名称:oryx,代码行数:21,代码来源:Crossfold.java
示例2: groupedWeightedSample
import org.apache.crunch.MapFn; //导入依赖的package包/类
public static <K, T, N extends Number> PTable<K, T> groupedWeightedSample(
PTable<K, Pair<T, N>> input,
int sampleSize,
RandomGenerator random) {
PTypeFamily ptf = input.getTypeFamily();
PType<K> keyType = input.getPTableType().getKeyType();
@SuppressWarnings("unchecked")
PType<T> ttype = (PType<T>) input.getPTableType().getValueType().getSubTypes().get(0);
PTableType<K, Pair<Double, T>> ptt = ptf.tableOf(keyType, ptf.pairs(ptf.doubles(), ttype));
// fill reservoirs by mapping over the vectors and re-emiting them; each map task emits at most sampleSize
// vectors per fold; the combiner/reducer will combine the outputs and pare down to sampleSize vectors total
PTable<K, Pair<Double, T>> samples = input.parallelDo("reservoirSampling",
new SampleFn<K, T, N>(sampleSize, random, ttype), ptt);
// pare down to just a single reservoir with sampleSize vectors
PTable<K, Pair<Double, T>> reservoir = samples.groupByKey(1).combineValues(new WRSCombineFn<K, T>(sampleSize, ttype));
// strip the weights off the final sampled reservoir and return
return reservoir.parallelDo("strippingSamplingWeights", new MapFn<Pair<K, Pair<Double, T>>, Pair<K, T>>() {
@Override
public Pair<K, T> map(Pair<K, Pair<Double, T>> p) {
return Pair.of(p.first(), p.second().second());
}
}, ptf.tableOf(keyType, ttype));
}
开发者ID:apsaltis,项目名称:oryx,代码行数:27,代码来源:ReservoirSampling.java
示例3: swapKeyValue
import org.apache.crunch.MapFn; //导入依赖的package包/类
/**
* Swap the key and value part of a PTable. The original PTypes are used in the opposite order
* @param table PTable to process
* @param <K> Key type (will become value type)
* @param <V> Value type (will become key type)
* @return PType<V, K> containing the same data as the original
*/
public static <K, V> PTable<V, K> swapKeyValue(PTable<K, V> table) {
PTypeFamily ptf = table.getTypeFamily();
return table.parallelDo(new MapFn<Pair<K, V>, Pair<V, K>>() {
@Override
public Pair<V, K> map(Pair<K, V> input) {
return Pair.of(input.second(), input.first());
}
}, ptf.tableOf(table.getValueType(), table.getKeyType()));
}
开发者ID:spotify,项目名称:crunch-lib,代码行数:17,代码来源:SPTables.java
示例4: negateCounts
import org.apache.crunch.MapFn; //导入依赖的package包/类
/**
* When creating toplists, it is often required to sort by count descending. As some sort operations don't support
* order (such as SecondarySort), this method will negate counts so that a natural-ordered sort will produce a
* descending order.
* @param table PTable to process
* @param <K> key type
* @return PTable of the same format with the value negated
*/
public static <K> PTable<K, Long> negateCounts(PTable<K, Long> table) {
return table.parallelDo(new MapFn<Pair<K, Long>, Pair<K, Long>>() {
@Override
public Pair<K, Long> map(Pair<K, Long> input) {
return Pair.of(input.first(), -input.second());
}
}, table.getPTableType());
}
开发者ID:spotify,项目名称:crunch-lib,代码行数:17,代码来源:SPTables.java
示例5: testZScores
import org.apache.crunch.MapFn; //导入依赖的package包/类
@Test
public void testZScores() {
PCollection<Record> elems = VECS.parallelDo(new MapFn<RealVector, Record>() {
@Override
public Record map(RealVector vec) {
return new VectorRecord(vec);
}
}, null);
Summarizer sr = new Summarizer();
Summary s = sr.build(elems).getValue();
StandardizeFn fn = new StandardizeFn(s, Transform.Z);
assertEquals(ImmutableList.of(Vectors.of(-1, 1),
Vectors.of(-1, -1), Vectors.of(1, -1),
Vectors.of(1, 1)), elems.parallelDo(fn, MLAvros.vector()).materialize());
}
开发者ID:apsaltis,项目名称:oryx,代码行数:16,代码来源:SummaryTest.java
示例6: testMissing
import org.apache.crunch.MapFn; //导入依赖的package包/类
@Test
public void testMissing() throws Exception {
PCollection<Record> elems = STRINGS.parallelDo(new MapFn<String, Record>() {
@Override
public Record map(String input) {
return new CSVRecord(Arrays.asList(input.split(",")));
}
}, MLRecords.csvRecord(AvroTypeFamily.getInstance(), ","));
Summarizer sr = new Summarizer();
Summary s = sr.build(elems).getValue();
assertEquals(1, s.getStats(1).getMissing());
assertEquals(2.0, s.getStats(1).mean(), 0.01);
assertEquals(0.0, s.getStats(1).stdDev(), 0.01);
}
开发者ID:apsaltis,项目名称:oryx,代码行数:15,代码来源:SummaryTest.java
示例7: testTrailingIgnoredFields
import org.apache.crunch.MapFn; //导入依赖的package包/类
@Test
public void testTrailingIgnoredFields() throws Exception {
Spec spec = RecordSpec.builder().add("field1", DataType.DOUBLE)
.add("field2", DataType.DOUBLE).add("field3", DataType.DOUBLE).build();
PCollection<Record> elems = STRINGS.parallelDo(new MapFn<String, Record>() {
@Override
public Record map(String input) {
return new CSVRecord(Arrays.asList(input.split(",")));
}
}, MLRecords.csvRecord(AvroTypeFamily.getInstance(), ","));
Summarizer sr = new Summarizer().spec(spec).ignoreColumns(2);
sr.build(elems).getValue();
}
开发者ID:apsaltis,项目名称:oryx,代码行数:14,代码来源:SummaryTest.java
示例8: record
import org.apache.crunch.MapFn; //导入依赖的package包/类
public static AvroType<Record> record(Schema schema) {
return Avros.derived(Record.class,
new MapFn<GenericData.Record, Record>() {
@Override
public Record map(GenericData.Record gdr) {
GenericData.Record copy = new GenericData.Record(gdr, true);
return new AvroRecord(copy);
}
},
new AvroRecordFn(schema),
Avros.generics(schema));
}
开发者ID:apsaltis,项目名称:oryx,代码行数:13,代码来源:MLRecords.java
示例9: vectorRecord
import org.apache.crunch.MapFn; //导入依赖的package包/类
public static PType<Record> vectorRecord(PType<RealVector> ptype, boolean sparse) {
return ptype.getFamily().derived(Record.class,
new MapFn<RealVector, Record>() {
@Override
public Record map(RealVector v) {
return new VectorRecord(v);
}
},
new Record2VectorFn(sparse),
ptype);
}
开发者ID:apsaltis,项目名称:oryx,代码行数:12,代码来源:MLRecords.java
示例10: sample
import org.apache.crunch.MapFn; //导入依赖的package包/类
public static <T> PCollection<T> sample(
PCollection<T> input,
int sampleSize,
RandomGenerator random) {
PTypeFamily ptf = input.getTypeFamily();
PType<Pair<T, Integer>> ptype = ptf.pairs(input.getPType(), ptf.ints());
return weightedSample(
input.parallelDo(new MapFn<T, Pair<T, Integer>>() {
@Override
public Pair<T, Integer> map(T t) { return Pair.of(t, 1); }
}, ptype),
sampleSize,
random);
}
开发者ID:apsaltis,项目名称:oryx,代码行数:15,代码来源:ReservoirSampling.java
示例11: weightedSample
import org.apache.crunch.MapFn; //导入依赖的package包/类
public static <T, N extends Number> PCollection<T> weightedSample(
PCollection<Pair<T, N>> input,
int sampleSize,
RandomGenerator random) {
PTypeFamily ptf = input.getTypeFamily();
PTable<Integer, Pair<T, N>> groupedIn = input.parallelDo(
new MapFn<Pair<T, N>, Pair<Integer, Pair<T, N>>>() {
@Override
public Pair<Integer, Pair<T, N>> map(Pair<T, N> p) {
return Pair.of(0, p);
}
}, ptf.tableOf(ptf.ints(), input.getPType()));
return groupedWeightedSample(groupedIn, sampleSize, random).values();
}
开发者ID:apsaltis,项目名称:oryx,代码行数:15,代码来源:ReservoirSampling.java
示例12: makeKeyFn
import org.apache.crunch.MapFn; //导入依赖的package包/类
private static MapFn<CQLRecord, ByteBuffer> makeKeyFn(final int[] partitionKeyIndexes) {
return new MapFn<CQLRecord, ByteBuffer>() {
@Override
public ByteBuffer map(final CQLRecord record) {
return CassandraRecordUtils.getPartitionKey(record.getValues(), partitionKeyIndexes);
}
};
}
开发者ID:spotify,项目名称:hdfs2cass,代码行数:9,代码来源:CassandraParams.java
示例13: getKeyFn
import org.apache.crunch.MapFn; //导入依赖的package包/类
/**
* @return a map function to extract the partition key from a record
*/
public MapFn<CQLRecord, ByteBuffer> getKeyFn() {
return makeKeyFn(clusterInfo.getPartitionKeyIndexes());
}
开发者ID:spotify,项目名称:hdfs2cass,代码行数:7,代码来源:CassandraParams.java
注:本文中的org.apache.crunch.MapFn类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论