Is the java parser generated by ANTLR capable of streaming arbitrarily large files?
I tried constructing a Lexer with a UnbufferedCharStream and passed that to the parser. I got an UnsupportedOperationException because of a call to size on the UnbufferedCharStream and the exception contained an explained that you can't call size on an UnbufferedCharStream.
new Lexer(new UnbufferedCharStream( new CharArrayReader("".toCharArray())));
CommonTokenStream stream = new CommonTokenStream(lexer);
Parser parser = new Parser(stream);
I basically have a file I exported from hadoop using pig. It has a large number of rows separated by '
'. Each column is split by a ''. This is easy to parse in java as I use a buffered reader to read each line. Then I split by '' to get each column. But I also want to have some sort of schema validation. The first column should be a properly formatted date, followed some price columns, followed by some hex columns.
When I look at the generated parser code I could call it like so
parser.lines().line()
This would give me a List which conceptually I could iterate over. But it seems that the list would have a fixed size by the time I get it. Which means the parser probably already parsed the entire file.
Is there another part of the API that would allow you to stream really large files? Like some way of using the Visitor or Listener to get called as it is reading the file? But it can't keep the entire file in memory. It will not fit.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…