This seems to be a common misunderstanding of ANTLR
:
Language Processing in ANTLR:
The Language Processing is done in two strictly separated phases:
- Lexing, i.e. partitioning the text into tokens
- Parsing, i.e. building a parse tree from the tokens
Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.
Lexing
Lexing in ANTLR works as following:
- all rules with uppercase first character are lexer rules
- the lexer starts at the beginning and tries to find a rule that matches best to the current input
- a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
- tokens are generated from matches:
- if one rule matches the maximum length match the corresponding token is pushed into the token stream
- if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream
Example: What is wrong with your grammar
Your grammar has two rules that are critical:
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\'|'/'|' '|'-'|'_'|'.')+ ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.
There are two hints for that:
- keep your lexer rules disjunct (no token should match a superset of another).
- if your tokens intentionally match the same strings, then put them into the right order (in your case this will be sufficient).
- if you need a parser driven lexer you have to change to another parser generator: PEG-Parsers or GLR-Parsers will do that (but of course this can produce other problems).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…