本文整理汇总了Java中org.apache.parquet.io.ColumnIOFactory类的典型用法代码示例。如果您正苦于以下问题:Java ColumnIOFactory类的具体用法?Java ColumnIOFactory怎么用?Java ColumnIOFactory使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
ColumnIOFactory类属于org.apache.parquet.io包,在下文中一共展示了ColumnIOFactory类的14个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: initialize
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public void initialize(FileMetaData parquetFileMetadata,
Path file, List<BlockMetaData> blocks, Configuration configuration)
throws IOException {
// initialize a ReadContext for this file
Map<String, String> fileMetadata = parquetFileMetadata.getKeyValueMetaData();
ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
configuration, toSetMultiMap(fileMetadata), fileSchema));
this.columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy());
this.requestedSchema = readContext.getRequestedSchema();
this.fileSchema = parquetFileMetadata.getSchema();
this.file = file;
this.columnCount = requestedSchema.getPaths().size();
this.recordConverter = readSupport.prepareForRead(
configuration, fileMetadata, fileSchema, readContext);
this.strictTypeChecking = configuration.getBoolean(STRICT_TYPE_CHECKING, true);
List<ColumnDescriptor> columns = requestedSchema.getColumns();
reader = new ParquetFileReader(configuration, parquetFileMetadata, file, blocks, columns);
for (BlockMetaData block : blocks) {
total += block.getRowCount();
}
this.unmaterializableRecordCounter = new UnmaterializableRecordCounter(configuration, total);
LOG.info("RecordReader initialized will read a total of " + total + " records.");
}
开发者ID:apache,项目名称:tajo,代码行数:24,代码来源:InternalParquetRecordReader.java
示例2: initialize
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public void initialize(MessageType fileSchema,
FileMetaData parquetFileMetadata,
Path file, List<BlockMetaData> blocks, Configuration configuration)
throws IOException {
// initialize a ReadContext for this file
Map<String, String> fileMetadata = parquetFileMetadata.getKeyValueMetaData();
ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
configuration, toSetMultiMap(fileMetadata), fileSchema));
this.columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy());
this.requestedSchema = readContext.getRequestedSchema();
this.fileSchema = fileSchema;
this.file = file;
this.columnCount = requestedSchema.getPaths().size();
this.recordConverter = readSupport.prepareForRead(
configuration, fileMetadata, fileSchema, readContext);
this.strictTypeChecking = true;
List<ColumnDescriptor> columns = requestedSchema.getColumns();
reader = new ParquetFileReader(configuration, parquetFileMetadata, file, blocks, columns);
for (BlockMetaData block : blocks) {
total += block.getRowCount();
}
Log.info("RecordReader initialized will read a total of " + total + " records.");
}
开发者ID:h2oai,项目名称:h2o-3,代码行数:24,代码来源:H2OInternalParquetReader.java
示例3: initialize
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public void initialize(ParquetFileReader reader, Configuration configuration)
throws IOException {
// initialize a ReadContext for this file
this.reader = reader;
FileMetaData parquetFileMetadata = reader.getFooter().getFileMetaData();
this.fileSchema = parquetFileMetadata.getSchema();
Map<String, String> fileMetadata = parquetFileMetadata.getKeyValueMetaData();
ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
configuration, toSetMultiMap(fileMetadata), fileSchema));
this.columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy());
this.requestedSchema = readContext.getRequestedSchema();
this.columnCount = requestedSchema.getPaths().size();
this.recordConverter = readSupport.prepareForRead(
configuration, fileMetadata, fileSchema, readContext);
this.strictTypeChecking = configuration.getBoolean(STRICT_TYPE_CHECKING, true);
this.total = reader.getRecordCount();
this.unmaterializableRecordCounter = new UnmaterializableRecordCounter(configuration, total);
this.filterRecords = configuration.getBoolean(RECORD_FILTERING_ENABLED, true);
reader.setRequestedSchema(requestedSchema);
LOG.info("RecordReader initialized will read a total of {} records.", total);
}
开发者ID:apache,项目名称:parquet-mr,代码行数:22,代码来源:InternalParquetRecordReader.java
示例4: validateSameTupleAsEB
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
/**
* <ul> steps:
* <li>Writes using the thrift mapping
* <li>Reads using the pig mapping
* <li>Use Elephant bird to convert from thrift to pig
* <li>Check that both transformations give the same result
* @param o the object to convert
* @throws TException
*/
public static <T extends TBase<?,?>> void validateSameTupleAsEB(T o) throws TException {
final ThriftSchemaConverter thriftSchemaConverter = new ThriftSchemaConverter();
@SuppressWarnings("unchecked")
final Class<T> class1 = (Class<T>) o.getClass();
final MessageType schema = thriftSchemaConverter.convert(class1);
final StructType structType = ThriftSchemaConverter.toStructType(class1);
final ThriftToPig<T> thriftToPig = new ThriftToPig<T>(class1);
final Schema pigSchema = thriftToPig.toSchema();
final TupleRecordMaterializer tupleRecordConverter = new TupleRecordMaterializer(schema, pigSchema, true);
RecordConsumer recordConsumer = new ConverterConsumer(tupleRecordConverter.getRootConverter(), schema);
final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
ParquetWriteProtocol p = new ParquetWriteProtocol(new RecordConsumerLoggingWrapper(recordConsumer), columnIO, structType);
o.write(p);
final Tuple t = tupleRecordConverter.getCurrentRecord();
final Tuple expected = thriftToPig.getPigTuple(o);
assertEquals(expected.toString(), t.toString());
final MessageType filtered = new PigSchemaConverter().filter(schema, pigSchema);
assertEquals(schema.toString(), filtered.toString());
}
开发者ID:apache,项目名称:parquet-mr,代码行数:30,代码来源:TestThriftToPigCompatibility.java
示例5: newSchema
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void newSchema() throws IOException {
// Reset it to half of current number and bound it within the limits
recordCountForNextMemCheck = min(max(MINIMUM_RECORD_COUNT_FOR_CHECK, recordCountForNextMemCheck / 2), MAXIMUM_RECORD_COUNT_FOR_CHECK);
String json = new Schema(batchSchema).toJson();
extraMetaData.put(DREMIO_ARROW_SCHEMA, json);
List<Type> types = Lists.newArrayList();
for (Field field : batchSchema) {
if (field.getName().equalsIgnoreCase(WriterPrel.PARTITION_COMPARATOR_FIELD)) {
continue;
}
Type childType = getType(field);
if (childType != null) {
types.add(childType);
}
}
Preconditions.checkState(types.size() > 0, "No types for parquet schema");
schema = new MessageType("root", types);
int dictionarySize = (int)context.getOptions().getOption(ExecConstants.PARQUET_DICT_PAGE_SIZE_VALIDATOR);
final ParquetProperties parquetProperties = new ParquetProperties(dictionarySize, writerVersion, enableDictionary,
new ParquetDirectByteBufferAllocator(columnEncoderAllocator), pageSize, true, enableDictionaryForBinary);
pageStore = ColumnChunkPageWriteStoreExposer.newColumnChunkPageWriteStore(codecFactory.getCompressor(codec), schema, parquetProperties);
store = new ColumnWriteStoreV1(pageStore, pageSize, parquetProperties);
MessageColumnIO columnIO = new ColumnIOFactory(false).getColumnIO(this.schema);
consumer = columnIO.getRecordWriter(store);
setUp(schema, consumer);
}
开发者ID:dremio,项目名称:dremio-oss,代码行数:29,代码来源:ParquetRecordWriter.java
示例6: ParquetRowReader
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public ParquetRowReader(Configuration configuration, Path filePath, ReadSupport<T> readSupport) throws IOException
{
this.filePath = filePath;
ParquetMetadata parquetMetadata = ParquetFileReader.readFooter(configuration, filePath, ParquetMetadataConverter.NO_FILTER);
List<BlockMetaData> blocks = parquetMetadata.getBlocks();
FileMetaData fileMetadata = parquetMetadata.getFileMetaData();
this.fileSchema = fileMetadata.getSchema();
Map<String, String> keyValueMetadata = fileMetadata.getKeyValueMetaData();
ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
configuration, toSetMultiMap(keyValueMetadata), fileSchema));
this.columnIOFactory = new ColumnIOFactory(fileMetadata.getCreatedBy());
this.requestedSchema = readContext.getRequestedSchema();
this.recordConverter = readSupport.prepareForRead(
configuration, fileMetadata.getKeyValueMetaData(), fileSchema, readContext);
List<ColumnDescriptor> columns = requestedSchema.getColumns();
reader = new ParquetFileReader(configuration, fileMetadata, filePath, blocks, columns);
long total = 0;
for (BlockMetaData block : blocks) {
total += block.getRowCount();
}
this.total = total;
this.unmaterializableRecordCounter = new UnmaterializableRecordCounter(configuration, total);
logger.info("ParquetRowReader initialized will read a total of " + total + " records.");
}
开发者ID:CyberAgent,项目名称:embulk-input-parquet_hadoop,代码行数:32,代码来源:ParquetRowReader.java
示例7: load
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public ITable load() {
try {
Configuration conf = new Configuration();
System.setProperty("hadoop.home.dir", "/");
conf.set("hadoop.security.authentication", "simple");
conf.set("hadoop.security.authorization", "false");
Path path = new Path(this.filename);
ParquetMetadata md = ParquetFileReader.readFooter(conf, path,
ParquetMetadataConverter.NO_FILTER);
MessageType schema = md.getFileMetaData().getSchema();
ParquetFileReader r = new ParquetFileReader(conf, path, md);
IAppendableColumn[] cols = this.createColumns(md);
MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
PageReadStore pages;
while (null != (pages = r.readNextRowGroup())) {
final long rows = pages.getRowCount();
RecordReader<Group> recordReader = columnIO.getRecordReader(
pages, new GroupRecordConverter(schema));
for (int i = 0; i < rows; i++) {
Group g = recordReader.read();
appendGroup(cols, g, md.getFileMetaData().getSchema().getColumns());
}
}
for (IAppendableColumn c: cols)
c.seal();
return new Table(cols);
} catch (IOException ex) {
throw new RuntimeException(ex);
}
}
开发者ID:vmware,项目名称:hillview,代码行数:33,代码来源:ParquetReader.java
示例8: newSchema
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void newSchema() throws IOException {
List<Type> types = Lists.newArrayList();
for (MaterializedField field : batchSchema) {
if (field.getName().equalsIgnoreCase(WriterPrel.PARTITION_COMPARATOR_FIELD)) {
continue;
}
types.add(getType(field));
}
schema = new MessageType("root", types);
// We don't want this number to be too small, ideally we divide the block equally across the columns.
// It is unlikely all columns are going to be the same size.
// Its value is likely below Integer.MAX_VALUE (2GB), although rowGroupSize is a long type.
// Therefore this size is cast to int, since allocating byte array in under layer needs to
// limit the array size in an int scope.
int initialBlockBufferSize = max(MINIMUM_BUFFER_SIZE, blockSize / this.schema.getColumns().size() / 5);
// We don't want this number to be too small either. Ideally, slightly bigger than the page size,
// but not bigger than the block buffer
int initialPageBufferSize = max(MINIMUM_BUFFER_SIZE, min(pageSize + pageSize / 10, initialBlockBufferSize));
// TODO: Use initialSlabSize from ParquetProperties once drill will be updated to the latest version of Parquet library
int initialSlabSize = CapacityByteArrayOutputStream.initialSlabSizeHeuristic(64, pageSize, 10);
// TODO: Replace ParquetColumnChunkPageWriteStore with ColumnChunkPageWriteStore from parquet library
// once PARQUET-1006 will be resolved
pageStore = new ParquetColumnChunkPageWriteStore(codecFactory.getCompressor(codec), schema, initialSlabSize,
pageSize, new ParquetDirectByteBufferAllocator(oContext));
store = new ColumnWriteStoreV1(pageStore, pageSize, initialPageBufferSize, enableDictionary,
writerVersion, new ParquetDirectByteBufferAllocator(oContext));
MessageColumnIO columnIO = new ColumnIOFactory(false).getColumnIO(this.schema);
consumer = columnIO.getRecordWriter(store);
setUp(schema, consumer);
}
开发者ID:axbaretto,项目名称:drill,代码行数:32,代码来源:ParquetRecordWriter.java
示例9: initStore
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void initStore() {
pageStore = new ColumnChunkPageWriteStore(compressor, schema, props.getAllocator());
columnStore = props.newColumnWriteStore(schema, pageStore);
MessageColumnIO columnIO = new ColumnIOFactory(validating).getColumnIO(schema);
this.recordConsumer = columnIO.getRecordWriter(columnStore);
writeSupport.prepareForWrite(recordConsumer);
}
开发者ID:apache,项目名称:parquet-mr,代码行数:8,代码来源:InternalParquetRecordWriter.java
示例10: validate
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private <T extends TBase<?,?>> void validate(T expected) throws TException {
@SuppressWarnings("unchecked")
final Class<T> thriftClass = (Class<T>)expected.getClass();
final MemPageStore memPageStore = new MemPageStore(1);
final ThriftSchemaConverter schemaConverter = new ThriftSchemaConverter();
final MessageType schema = schemaConverter.convert(thriftClass);
LOG.info("{}", schema);
final MessageColumnIO columnIO = new ColumnIOFactory(true).getColumnIO(schema);
final ColumnWriteStoreV1 columns = new ColumnWriteStoreV1(memPageStore,
ParquetProperties.builder()
.withPageSize(10000)
.withDictionaryEncoding(false)
.build());
final RecordConsumer recordWriter = columnIO.getRecordWriter(columns);
final StructType thriftType = schemaConverter.toStructType(thriftClass);
ParquetWriteProtocol parquetWriteProtocol = new ParquetWriteProtocol(recordWriter, columnIO, thriftType);
expected.write(parquetWriteProtocol);
recordWriter.flush();
columns.flush();
ThriftRecordConverter<T> converter = new TBaseRecordConverter<T>(thriftClass, schema, thriftType);
final RecordReader<T> recordReader = columnIO.getRecordReader(memPageStore, converter);
final T result = recordReader.read();
assertEquals(expected, result);
}
开发者ID:apache,项目名称:parquet-mr,代码行数:29,代码来源:TestParquetReadProtocol.java
示例11: validateThrift
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void validateThrift(String[] expectations, TBase<?, ?> a)
throws TException {
final ThriftSchemaConverter thriftSchemaConverter = new ThriftSchemaConverter();
// System.out.println(a);
final Class<TBase<?,?>> class1 = (Class<TBase<?,?>>)a.getClass();
final MessageType schema = thriftSchemaConverter.convert(class1);
LOG.info("{}", schema);
final StructType structType = thriftSchemaConverter.toStructType(class1);
ExpectationValidatingRecordConsumer recordConsumer = new ExpectationValidatingRecordConsumer(new ArrayDeque<String>(Arrays.asList(expectations)));
final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
ParquetWriteProtocol p = new ParquetWriteProtocol(new RecordConsumerLoggingWrapper(recordConsumer), columnIO, structType);
a.write(p);
}
开发者ID:apache,项目名称:parquet-mr,代码行数:14,代码来源:TestParquetWriteProtocol.java
示例12: prepareForWrite
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
@Override
public void prepareForWrite(RecordConsumer recordConsumer) {
final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
this.parquetWriteProtocol = new ParquetWriteProtocol(recordConsumer, columnIO, thriftStruct);
}
开发者ID:apache,项目名称:parquet-mr,代码行数:6,代码来源:AbstractThriftWriteSupport.java
示例13: prepareForWrite
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
@Override
public void prepareForWrite(RecordConsumer recordConsumer) {
final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
this.parquetWriteProtocol = new ParquetWriteProtocol(recordConsumer, columnIO, thriftStruct);
thriftWriteSupport.prepareForWrite(recordConsumer);
}
开发者ID:apache,项目名称:parquet-mr,代码行数:7,代码来源:ThriftBytesWriteSupport.java
示例14: newColumnFactory
import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private static MessageColumnIO newColumnFactory(String pigSchemaString) throws ParserException {
MessageType schema = new PigSchemaConverter().convert(Utils.getSchemaFromString(pigSchemaString));
return new ColumnIOFactory().getColumnIO(schema);
}
开发者ID:apache,项目名称:parquet-mr,代码行数:5,代码来源:TupleConsumerPerfTest.java
注:本文中的org.apache.parquet.io.ColumnIOFactory类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论