Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
397 views
in Technique[技术] by (71.8m points)

parsing - How to specify beginning-of-line keywords in ANTLR grammars (which also works for the first input line)

This is a question about the remaining problem of the solution proposed for another Stackoverflow question about beginning-of-line keywords.

I am writing an ANTLR4 lexer and parser for a programming language where something is a keyword in case it is the first non-whitespace token of a line. Let me explain this with an example. Suppose "bla" is a keyword then in the following example:

foo bla
    bla foo foo
foo bla bla

the second "bla" should be recognized as a keyword but the others shouldn't.

In order to achieve this I have defined the following simple ANTLR4 grammar:

grammar foobla;

// PARSER

main
    : line* EOF
    ;

line
    : indicator text*
    ;

indicator
    : foo
    | bla
    ;

foo: FOO ;
bla: BLA ;
text: TEXT ;

// LEXER

WHITESPACE: [ ] -> skip ;

fragment NL: [

f]+[ ]* ;
fragment NONNL: ~[

f] ;

// Indicators
FOO: NL 'foo' ;
BLA: NL 'bla' ;

TEXT: NONNL+ ;

This is similar to the answer given in How to detect beginning of line, or: "The name 'getCharPositionInLine' does not exist in the current context".

Now my question. This works fine, except in case the "bla" or "foo" keyword is used in the first line of the input program. I can think of 2 ways to solve this but I don't know how this can be achieved:

  • Use something like a BOF (beginning of file) token. However, I can't find such a concept in the manual
  • Use a hook to dynamically add a new line at the beginning of the input file before the parsing starts, preferably by specifying something in the g4 file itself. This I couldn't find either in the manual

I don't want to write an extra application/wrapper to add a new line to the input file just because of this.

question from:https://stackoverflow.com/questions/65900992/how-to-specify-beginning-of-line-keywords-in-antlr-grammars-which-also-works-fo

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here's another idea:

In your BLA lexer rule add a predicate which checks the end of the token stream (where the BLA token is not yet added) to see on which line the last non-whitespace token was. If that line differs from the current token line you know the BLA token is really a BLA token, otherwise set its type to IDENTIIFIER.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...