Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
310 views
in Technique[技术] by (71.8m points)

java - ANTLR match identifier but not reserved keywords

I am trying to match complex numbers using different notations, one of them using the cis function as such : MODULUS cis PHASE

The problem is that my identifier rule matches the cis as well as the start of the number following it and since it's bigger than the CIS token itself it always returns an identifier token type. How could i avoid that ?

Here's the grammar :

grammar Sandbox;

input : number? CIS UNSIGNED 
    | IDENTIFIER
    ;

number : FLOAT
    | UFLOAT 
    | UINT
    | INT
    ;

fragment DIGIT : [0-9] ;

UFLOAT : UINT (DOT UINT? | 'f') ;
FLOAT : SUB UFLOAT ;
UINT : DIGITS ;
INT : SUB UINT ;
UNSIGNED : UFLOAT 
    | UINT 
    ;
DIGITS : DIGIT+ ;

// Specific lexer rules
CIS : 'cis' ;
SUB : '-' ; 
DOT : '.' ;
WS : [ ]+ -> skip ;
NEWLINE : '
'? '
' ;

IDENTIFIER : [a-zA-Z_]+[a-zA-Z0-9_]* ;  // has to be after complex so i or cis doesn't match this first

Edit: The input i was trying to parse with is the complex 1+i but using it's respective modulus and phase like this : 1.4142135623730951cis0.7853981633974483

And my actual problem is that the IDENTIFIER rule matches cis0 instead of just matching the CIS lexer rule even though it's defined before it.

I vaguely know that ANTLR chooses the rule based on the biggest match, but in this case i want to avoid that =o.

question from:https://stackoverflow.com/questions/65863023/antlr-match-identifier-but-not-reserved-keywords

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I see two solutions here:

  1. Make the complex number a single lexer rule:
COMPLEX:  (FLOAT | UFLOAT | UINT | INT) WS* CIS WS* UNSIGNED;

which will be longer than an identifier or the pur CIS keyword (and hence matched first).

  1. A cis secquence is a keyword, when it follows a digit (with optional whitespaces between them), right? So, you could do a lookback (LA(-1) in your predicate to reject cis as identifier if that condition is true.

I'd prefer solution 1, because the convention is that single entities (and a complex number is, like a float number or a string, a single logicial entity) are match completely in a lexer rule, not in a parser rule.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...