Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
447 views
in Technique[技术] by (71.8m points)

java - Is there a canonical example of how to use Antlr 4 to parse SQL statements?

I am trying to build a parser using Antlr4 for SQL statements. I don't really care which particular grammar of SQL I use, as I plan to enforce that only ANSI SQL is allowed, but in the example below I happen to be using the grammar for T-SQL. Here is my simple code:

String sql = "SELECT ROW_NUMBER() OVER (ORDER BY id) FROM some_table";
TSqlLexer tSqlLexer = new TSqlLexer(CharStreams.fromString(sql));
CommonTokenStream stream = new CommonTokenStream(tSqlLexer);
TSqlParser parser = new TSqlParser(stream);
ParseTree tree = parser.tsql_file();  // errors happen here
ParseTreeWalker walker = new ParseTreeWalker();
// I built a custom listener, so far not much in it
AnalyticFunctionBaseListener listener = new AnalyticFunctionBaseListener();
walker.walk(listener, tree);

The code only gets as far as the call to tsql_file() before generating the following errors/warnings:

line 1:35 token recognition error at: 'i'
line 1:36 token recognition error at: 'd'
line 1:44 token recognition error at: 's'
line 1:45 token recognition error at: 'o'
line 1:46 token recognition error at: 'm'
line 1:47 token recognition error at: 'e'
line 1:49 token recognition error at: 't'
line 1:50 token recognition error at: 'a'
line 1:51 token recognition error at: 'b'
line 1:52 token recognition error at: 'l'
line 1:53 token recognition error at: 'e'
line 1:37 no viable alternative at input 'SELECTROW_NUMBER()OVER(ORDERBY)'

There is clearly something major I am missing here, but I don't what that is. I built using the published grammars for TSQL available at the ANTLR GitHub site.

Can any Antlr guru modify the above snippet so that it works? I am hoping someone can give a canonical example of how to use Antlr to parse a basic SQL statement.

question from:https://stackoverflow.com/questions/65904479/is-there-a-canonical-example-of-how-to-use-antlr-4-to-parse-sql-statements

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Note the following comment in the README:

Usage, important note

As SQL grammar are normally not case sensitive but this grammar implementation is, you must use a custom character stream that converts all characters to uppercase before sending them to the lexer.

You could find more information here with implementations for various target languages.

In short, change your code:

String sql = "SELECT ROW_NUMBER() OVER (ORDER BY id) FROM some_table";
TSqlLexer tSqlLexer = new TSqlLexer(CharStreams.fromString(sql));

to:

String sql = "SELECT ROW_NUMBER() OVER (ORDER BY id) FROM some_table";
CharStream s = CharStreams.fromString(sql);
TSqlLexer tSqlLexer = new TSqlLexer(new CaseChangingCharStream(s, true));

Find the source of CaseChangingCharStream here: https://github.com/antlr/antlr4/blob/master/doc/resources/CaseChangingCharStream.java

EDIT

In the comments, Mike suggests:

Alternatively you can use the MySQL grammar, which supports case-insensitive keywords without an extra stream

which might be a better option. I'm not saying the T-SQL grammar isn't good/accurate, but the fact that Mike's suggested grammar comes from the official MySQL repo (and Mike contributed to it), would give me confidence in the quality of it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...