本文整理汇总了Java中org.apache.orc.Reader类的典型用法代码示例。如果您正苦于以下问题:Java Reader类的具体用法?Java Reader怎么用?Java Reader使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
Reader类属于org.apache.orc包,在下文中一共展示了Reader类的18个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: testSplitStripesGivenSplits
import org.apache.orc.Reader; //导入依赖的package包/类
@Test
public void testSplitStripesGivenSplits() throws IOException {
rowOrcInputFormat =
new OrcRowInputFormat(getPath(TEST_FILE_FLAT), TEST_SCHEMA_FLAT, new Configuration());
OrcRowInputFormat spy = spy(rowOrcInputFormat);
// mock options to check configuration of ORC reader
Reader.Options options = spy(new Reader.Options());
doReturn(options).when(spy).getOptions(any());
FileInputSplit[] splits = spy.createInputSplits(3);
spy.openInputFormat();
spy.open(splits[0]);
verify(options).range(eq(3L), eq(137005L));
spy.open(splits[1]);
verify(options).range(eq(137008L), eq(136182L));
spy.open(splits[2]);
verify(options).range(eq(273190L), eq(123633L));
}
开发者ID:axbaretto,项目名称:flink,代码行数:22,代码来源:OrcRowInputFormatTest.java
示例2: getTotalSize
import org.apache.orc.Reader; //导入依赖的package包/类
/**
* get the total uncompressed size of the orc files.
*
* @return
*/
@Override
public long getTotalSize()
{
long size = 0;
for (Reader reader : this.fileReaders)
{
// contentLength includes the header ('ORC') length which is 3 bytes.
size += reader.getContentLength()-3;
}
return size;
}
开发者ID:dbiir,项目名称:rainbow,代码行数:17,代码来源:OrcMetadataStat.java
示例3: test
import org.apache.orc.Reader; //导入依赖的package包/类
@Test
public void test () throws IOException, Descriptors.DescriptorValidationException
{
Configuration conf = new Configuration();
System.setProperty("hadoop.home.dir", "/");
FileSystem fileSystem = FileSystem.get(URI.create("hdfs://presto00:9000"), conf);
Path hdfsDirPath = new Path("/rainbow2/orc_new_compress");
System.out.println(fileSystem.isFile(hdfsDirPath));
FileStatus[] fileStatuses = fileSystem.listStatus(hdfsDirPath);
System.out.println(fileStatuses.length);
for (FileStatus status : fileStatuses)
{
status.getPath();
System.out.println(status.getPath() + ", " + status.getLen());
}
Reader reader = OrcFile.createReader(fileStatuses[0].getPath(),
OrcFile.readerOptions(conf));
System.out.println("file length:" + reader.getFileTail().getFileLength());
List<String> columnNames = new ArrayList<>();
columnNames.add("samplepercent");
System.out.println(reader.getRawDataSizeOfColumns(columnNames));
System.out.println(reader.getFileTail().getFooter().getTypes(0).getFieldNames(0));
System.out.println(reader.getTypes().get(0).getSerializedSize());
List<Reader> readers = new ArrayList<>();
for (FileStatus fileStatus : fileStatuses)
{
Reader reader1 = OrcFile.createReader(fileStatus.getPath(),
OrcFile.readerOptions(conf));
readers.add(reader1);
System.out.println("content size: " + reader1.getContentLength() + ", raw size: "
+ reader1.getRawDataSize());
}
for (String columnName : reader.getSchema().getFieldNames())
{
System.out.println(columnName);
}
}
开发者ID:dbiir,项目名称:rainbow,代码行数:41,代码来源:TestOrcMetadata.java
示例4: testProjectionMaskNested
import org.apache.orc.Reader; //导入依赖的package包/类
@Test
public void testProjectionMaskNested() throws IOException{
rowOrcInputFormat =
new OrcRowInputFormat(getPath(TEST_FILE_NESTED), TEST_SCHEMA_NESTED, new Configuration());
OrcRowInputFormat spy = spy(rowOrcInputFormat);
// mock options to check configuration of ORC reader
Reader.Options options = new Reader.Options();
doReturn(options).when(spy).getOptions(any());
spy.selectFields(9, 11, 2);
spy.openInputFormat();
FileInputSplit[] splits = spy.createInputSplits(1);
spy.open(splits[0]);
// top-level struct is false
boolean[] expected = new boolean[]{
false, // top level
false, false, // flat fields 0, 1 are out
true, // flat field 2 is in
false, false, false, false, false, false, // flat fields 3, 4, 5, 6, 7, 8 are out
true, true, true, true, true, // nested field 9 is in
false, false, false, false, // nested field 10 is out
true, true, true, true, true}; // nested field 11 is in
assertArrayEquals(expected, options.getInclude());
}
开发者ID:axbaretto,项目名称:flink,代码行数:28,代码来源:OrcRowInputFormatTest.java
示例5: testTimePredicates
import org.apache.orc.Reader; //导入依赖的package包/类
@Test
public void testTimePredicates() throws Exception {
rowOrcInputFormat =
new OrcRowInputFormat(getPath(TEST_FILE_TIMETYPES), TEST_SCHEMA_TIMETYPES, new Configuration());
rowOrcInputFormat.addPredicate(
// OR
new OrcRowInputFormat.Or(
// timestamp pred
new OrcRowInputFormat.Equals("time", PredicateLeaf.Type.TIMESTAMP, Timestamp.valueOf("1900-05-05 12:34:56.100")),
// date pred
new OrcRowInputFormat.Equals("date", PredicateLeaf.Type.DATE, Date.valueOf("1900-12-25")))
);
FileInputSplit[] splits = rowOrcInputFormat.createInputSplits(1);
rowOrcInputFormat.openInputFormat();
// mock options to check configuration of ORC reader
OrcRowInputFormat spy = spy(rowOrcInputFormat);
Reader.Options options = new Reader.Options();
doReturn(options).when(spy).getOptions(any());
spy.openInputFormat();
spy.open(splits[0]);
// verify predicate configuration
SearchArgument sarg = options.getSearchArgument();
assertNotNull(sarg);
assertEquals("(or leaf-0 leaf-1)", sarg.getExpression().toString());
assertEquals(2, sarg.getLeaves().size());
List<PredicateLeaf> leaves = sarg.getLeaves();
assertEquals("(EQUALS time 1900-05-05 12:34:56.1)", leaves.get(0).toString());
assertEquals("(EQUALS date 1900-12-25)", leaves.get(1).toString());
}
开发者ID:axbaretto,项目名称:flink,代码行数:35,代码来源:OrcRowInputFormatTest.java
示例6: testDecimalPredicate
import org.apache.orc.Reader; //导入依赖的package包/类
@Test
public void testDecimalPredicate() throws Exception {
rowOrcInputFormat =
new OrcRowInputFormat(getPath(TEST_FILE_DECIMAL), TEST_SCHEMA_DECIMAL, new Configuration());
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.Not(
// decimal pred
new OrcRowInputFormat.Equals("_col0", PredicateLeaf.Type.DECIMAL, BigDecimal.valueOf(-1000.5))));
FileInputSplit[] splits = rowOrcInputFormat.createInputSplits(1);
rowOrcInputFormat.openInputFormat();
// mock options to check configuration of ORC reader
OrcRowInputFormat spy = spy(rowOrcInputFormat);
Reader.Options options = new Reader.Options();
doReturn(options).when(spy).getOptions(any());
spy.openInputFormat();
spy.open(splits[0]);
// verify predicate configuration
SearchArgument sarg = options.getSearchArgument();
assertNotNull(sarg);
assertEquals("(not leaf-0)", sarg.getExpression().toString());
assertEquals(1, sarg.getLeaves().size());
List<PredicateLeaf> leaves = sarg.getLeaves();
assertEquals("(EQUALS _col0 -1000.5)", leaves.get(0).toString());
}
开发者ID:axbaretto,项目名称:flink,代码行数:30,代码来源:OrcRowInputFormatTest.java
示例7: JsonORCFileReader
import org.apache.orc.Reader; //导入依赖的package包/类
@SuppressWarnings("deprecation")
public JsonORCFileReader(LogFilePath logFilePath, CompressionCodec codec)
throws IOException {
schema = schemaProvider.getSchema(logFilePath.getTopic(),
logFilePath);
Path path = new Path(logFilePath.getLogFilePath());
Reader reader = OrcFile.createReader(path,
OrcFile.readerOptions(new Configuration(true)));
offset = logFilePath.getOffset();
rows = reader.rows();
batch = reader.getSchema().createRowBatch();
rows.nextBatch(batch);
}
开发者ID:pinterest,项目名称:secor,代码行数:14,代码来源:JsonORCFileReaderWriterFactory.java
示例8: readSchema
import org.apache.orc.Reader; //导入依赖的package包/类
protected SchemaDescription readSchema( Reader orcReader ) throws Exception {
OrcSchemaConverter OrcSchemaConverter = new OrcSchemaConverter();
SchemaDescription schemaDescription = OrcSchemaConverter.buildSchemaDescription( readTypeDescription( orcReader ) );
IOrcMetaData.Reader orcMetaDataReader = new OrcMetaDataReader( orcReader );
orcMetaDataReader.read( schemaDescription );
return schemaDescription;
}
开发者ID:pentaho,项目名称:pentaho-hadoop-shims,代码行数:8,代码来源:PentahoOrcInputFormat.java
示例9: getReader
import org.apache.orc.Reader; //导入依赖的package包/类
private Reader getReader( ) throws Exception {
return inClassloader( () -> {
checkNullFileName();
Path filePath;
FileSystem fs;
Reader orcReader;
try {
filePath = new Path( fileName );
fs = FileSystem.get( filePath.toUri(), conf );
if ( !fs.exists( filePath ) ) {
throw new NoSuchFileException( fileName );
}
if ( fs.getFileStatus( filePath ).isDirectory() ) {
PathFilter pathFilter = new PathFilter() {
public boolean accept( Path file ) {
return file.getName().endsWith( ".orc" );
}
};
FileStatus[] fileStatuses = fs.listStatus( filePath, pathFilter );
if ( fileStatuses.length == 0 ) {
throw new NoSuchFileException( fileName );
}
filePath = fileStatuses[0].getPath();
}
orcReader = OrcFile.createReader( filePath,
OrcFile.readerOptions( conf ).filesystem( fs ) );
} catch ( IOException e ) {
throw new RuntimeException( "Unable to read data from file " + fileName, e );
}
return orcReader;
} );
}
开发者ID:pentaho,项目名称:pentaho-hadoop-shims,代码行数:37,代码来源:PentahoOrcInputFormat.java
示例10: getOptions
import org.apache.orc.Reader; //导入依赖的package包/类
public Reader.Options getOptions() {
return options;
}
开发者ID:ampool,项目名称:monarch,代码行数:4,代码来源:OrcUtils.java
示例11: setup
import org.apache.orc.Reader; //导入依赖的package包/类
@Before
public void setup() throws Exception {
PowerMockito.mockStatic(FileSystem.class);
when(FileSystem.get(configuration)).thenReturn(fileSystem);
when(FileSystem.get(any(URI.class), any(Configuration.class))).thenReturn(fileSystem);
when(fileStatus.getPath()).thenReturn(path);
when(fileStatus.isDirectory()).thenReturn(false);
FileStatus[] fileStatuses = {fileStatus};
when(fileSystem.listStatus(any(Path.class))).thenReturn(fileStatuses);
when(fileStatus2.getPath()).thenReturn(path2);
when(fileStatus2.isDirectory()).thenReturn(true);
when(fileSystem.listStatus(path2)).thenReturn(fileStatuses);
final FSDataInputStream fsDataInputStream = mock(FSDataInputStream.class);
when(fileSystem.open(any(Path.class))).thenReturn(fsDataInputStream);
mockStatic(Job.class);
when(Job.getInstance(configuration)).thenReturn(job);
when(job.getConfiguration()).thenReturn(configuration);
when(path.getFileSystem(configuration)).thenReturn(fileSystem);
when(fileSystem.makeQualified(path)).thenReturn(path);
final UUID uuid = UUID.randomUUID();
mockStatic(UUID.class);
whenNew(UUID.class).withAnyArguments().thenReturn(uuid);
when(UUID.randomUUID()).thenReturn(uuid);
whenNew(Path.class).withArguments("/apps/datasqueeze/staging/tmp-" + uuid.toString()).thenReturn(path);
whenNew(Path.class).withArguments("/source/path").thenReturn(path);
whenNew(Path.class).withArguments("s3/source/path").thenReturn(path);
whenNew(Path.class).withArguments("/source/path/dir").thenReturn(path2);
when(configuration.get("mapreduce.multipleoutputs", "")).thenReturn("");
whenNew(ReaderImpl.class).withArguments(any(Path.class), any(OrcFile.ReaderOptions.class)).thenReturn(reader);
final TypeDescription schema = TypeDescription.createStruct()
.addField("field1", TypeDescription.createInt());
when(reader.getSchema()).thenReturn(schema);
when(reader.getCompressionKind()).thenReturn(CompressionKind.SNAPPY);
CompactionManagerFactory.DEFAULT_THRESHOLD_IN_BYTES = 1234L;
whenNew(JobRunner.class).withArguments(job).thenReturn(jobRunner);
whenNew(SequenceFile.Reader.class).withArguments(any(Configuration.class), any(Path.class)).thenReturn(seqReader);
when(seqReader.isCompressed()).thenReturn(true);
CompressionCodec compressionCodec = mock(CompressionCodec.class);
when(seqReader.getCompressionCodec()).thenReturn(compressionCodec);
when(seqReader.getCompressionType()).thenReturn(SequenceFile.CompressionType.BLOCK);
}
开发者ID:ExpediaInceCommercePlatform,项目名称:dataSqueeze,代码行数:49,代码来源:CompactionManagerImplTest.java
示例12: open
import org.apache.orc.Reader; //导入依赖的package包/类
@Override
public void open(FileInputSplit fileSplit) throws IOException {
LOG.debug("Opening ORC file {}", fileSplit.getPath());
// open ORC file and create reader
org.apache.hadoop.fs.Path hPath = new org.apache.hadoop.fs.Path(fileSplit.getPath().getPath());
Reader orcReader = OrcFile.createReader(hPath, OrcFile.readerOptions(conf));
// get offset and length for the stripes that start in the split
Tuple2<Long, Long> offsetAndLength = getOffsetAndLengthForSplit(fileSplit, getStripes(orcReader));
// create ORC row reader configuration
Reader.Options options = getOptions(orcReader)
.schema(schema)
.range(offsetAndLength.f0, offsetAndLength.f1)
.useZeroCopy(OrcConf.USE_ZEROCOPY.getBoolean(conf))
.skipCorruptRecords(OrcConf.SKIP_CORRUPT_DATA.getBoolean(conf))
.tolerateMissingSchema(OrcConf.TOLERATE_MISSING_SCHEMA.getBoolean(conf));
// configure filters
if (!conjunctPredicates.isEmpty()) {
SearchArgument.Builder b = SearchArgumentFactory.newBuilder();
b = b.startAnd();
for (Predicate predicate : conjunctPredicates) {
predicate.add(b);
}
b = b.end();
options.searchArgument(b.build(), new String[]{});
}
// configure selected fields
options.include(computeProjectionMask());
// create ORC row reader
this.orcRowsReader = orcReader.rows(options);
// assign ids
this.schema.getId();
// create row batch
this.rowBatch = schema.createRowBatch(batchSize);
rowsInBatch = 0;
nextRow = 0;
}
开发者ID:axbaretto,项目名称:flink,代码行数:45,代码来源:OrcRowInputFormat.java
示例13: getOptions
import org.apache.orc.Reader; //导入依赖的package包/类
@VisibleForTesting
Reader.Options getOptions(Reader orcReader) {
return orcReader.options();
}
开发者ID:axbaretto,项目名称:flink,代码行数:5,代码来源:OrcRowInputFormat.java
示例14: getStripes
import org.apache.orc.Reader; //导入依赖的package包/类
@VisibleForTesting
List<StripeInformation> getStripes(Reader orcReader) {
return orcReader.getStripes();
}
开发者ID:axbaretto,项目名称:flink,代码行数:5,代码来源:OrcRowInputFormat.java
示例15: testSplitStripesCustomSplits
import org.apache.orc.Reader; //导入依赖的package包/类
@Test
public void testSplitStripesCustomSplits() throws IOException {
rowOrcInputFormat =
new OrcRowInputFormat(getPath(TEST_FILE_FLAT), TEST_SCHEMA_FLAT, new Configuration());
OrcRowInputFormat spy = spy(rowOrcInputFormat);
// mock list of stripes
List<StripeInformation> stripes = new ArrayList<>();
StripeInformation stripe1 = mock(StripeInformation.class);
when(stripe1.getOffset()).thenReturn(10L);
when(stripe1.getLength()).thenReturn(90L);
StripeInformation stripe2 = mock(StripeInformation.class);
when(stripe2.getOffset()).thenReturn(100L);
when(stripe2.getLength()).thenReturn(100L);
StripeInformation stripe3 = mock(StripeInformation.class);
when(stripe3.getOffset()).thenReturn(200L);
when(stripe3.getLength()).thenReturn(100L);
StripeInformation stripe4 = mock(StripeInformation.class);
when(stripe4.getOffset()).thenReturn(300L);
when(stripe4.getLength()).thenReturn(100L);
StripeInformation stripe5 = mock(StripeInformation.class);
when(stripe5.getOffset()).thenReturn(400L);
when(stripe5.getLength()).thenReturn(100L);
stripes.add(stripe1);
stripes.add(stripe2);
stripes.add(stripe3);
stripes.add(stripe4);
stripes.add(stripe5);
doReturn(stripes).when(spy).getStripes(any());
// mock options to check configuration of ORC reader
Reader.Options options = spy(new Reader.Options());
doReturn(options).when(spy).getOptions(any());
spy.openInputFormat();
// split ranging 2 stripes
spy.open(new FileInputSplit(0, new Path(getPath(TEST_FILE_FLAT)), 0, 150, new String[]{}));
verify(options).range(eq(10L), eq(190L));
// split ranging 0 stripes
spy.open(new FileInputSplit(1, new Path(getPath(TEST_FILE_FLAT)), 150, 10, new String[]{}));
verify(options).range(eq(0L), eq(0L));
// split ranging 1 stripe
spy.open(new FileInputSplit(2, new Path(getPath(TEST_FILE_FLAT)), 160, 41, new String[]{}));
verify(options).range(eq(200L), eq(100L));
// split ranging 2 stripe
spy.open(new FileInputSplit(3, new Path(getPath(TEST_FILE_FLAT)), 201, 299, new String[]{}));
verify(options).range(eq(300L), eq(200L));
}
开发者ID:axbaretto,项目名称:flink,代码行数:50,代码来源:OrcRowInputFormatTest.java
示例16: testNumericBooleanStringPredicates
import org.apache.orc.Reader; //导入依赖的package包/类
@Test
public void testNumericBooleanStringPredicates() throws Exception {
rowOrcInputFormat =
new OrcRowInputFormat(getPath(TEST_FILE_NESTED), TEST_SCHEMA_NESTED, new Configuration());
rowOrcInputFormat.selectFields(0, 1, 2, 3, 4, 5, 6, 8);
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.Equals("boolean1", PredicateLeaf.Type.BOOLEAN, false));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.LessThan("byte1", PredicateLeaf.Type.LONG, 1));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.LessThanEquals("short1", PredicateLeaf.Type.LONG, 1024));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.Between("int1", PredicateLeaf.Type.LONG, -1, 65536));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.Equals("long1", PredicateLeaf.Type.LONG, 9223372036854775807L));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.Equals("float1", PredicateLeaf.Type.FLOAT, 1.0));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.Equals("double1", PredicateLeaf.Type.FLOAT, -15.0));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.IsNull("string1", PredicateLeaf.Type.STRING));
// boolean pred
rowOrcInputFormat.addPredicate(
new OrcRowInputFormat.Equals("string1", PredicateLeaf.Type.STRING, "hello"));
FileInputSplit[] splits = rowOrcInputFormat.createInputSplits(1);
rowOrcInputFormat.openInputFormat();
// mock options to check configuration of ORC reader
OrcRowInputFormat spy = spy(rowOrcInputFormat);
Reader.Options options = new Reader.Options();
doReturn(options).when(spy).getOptions(any());
spy.openInputFormat();
spy.open(splits[0]);
// verify predicate configuration
SearchArgument sarg = options.getSearchArgument();
assertNotNull(sarg);
assertEquals("(and leaf-0 leaf-1 leaf-2 leaf-3 leaf-4 leaf-5 leaf-6 leaf-7 leaf-8)", sarg.getExpression().toString());
assertEquals(9, sarg.getLeaves().size());
List<PredicateLeaf> leaves = sarg.getLeaves();
assertEquals("(EQUALS boolean1 false)", leaves.get(0).toString());
assertEquals("(LESS_THAN byte1 1)", leaves.get(1).toString());
assertEquals("(LESS_THAN_EQUALS short1 1024)", leaves.get(2).toString());
assertEquals("(BETWEEN int1 -1 65536)", leaves.get(3).toString());
assertEquals("(EQUALS long1 9223372036854775807)", leaves.get(4).toString());
assertEquals("(EQUALS float1 1.0)", leaves.get(5).toString());
assertEquals("(EQUALS double1 -15.0)", leaves.get(6).toString());
assertEquals("(IS_NULL string1)", leaves.get(7).toString());
assertEquals("(EQUALS string1 hello)", leaves.get(8).toString());
}
开发者ID:axbaretto,项目名称:flink,代码行数:63,代码来源:OrcRowInputFormatTest.java
示例17: readTypeDescription
import org.apache.orc.Reader; //导入依赖的package包/类
public TypeDescription readTypeDescription( ) throws Exception {
checkNullFileName();
Reader orcReader = getReader( );
return readTypeDescription( orcReader );
}
开发者ID:pentaho,项目名称:pentaho-hadoop-shims,代码行数:6,代码来源:PentahoOrcInputFormat.java
示例18: OrcMetaDataReader
import org.apache.orc.Reader; //导入依赖的package包/类
public OrcMetaDataReader( Reader reader ) {
this.reader = reader;
}
开发者ID:pentaho,项目名称:pentaho-hadoop-shims,代码行数:4,代码来源:OrcMetaDataReader.java
注:本文中的org.apache.orc.Reader类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论