I made 5 lists which represents tokens (lexical classes) - identifiers, constants, operators, keywords and separators.
Reading from .txt file, I'm sorting file elements into my lists.
My .txt file
a273 = 4 + 1337
variable = 50.123132123 + 3.123123132 / 23.121212
a273 is appended into identifiers, ' ' and '
' into separators, = into operators, 4 and 1337 into constants and + in operators. (same for the next line)
So I want to make a some kind of modified print function which will print my elements like this:
('a273', identifiers)
(' ', separators)
('=', operators)
(' ', separators)
('4', constants)
(' ', separators)
('+', operators)
(' ', separators)
('1337', constants)
And print like that for all lines.
I tried something like this (for identifiers only)
for i in newList:
if i in identifiers:
print(i, " identifiers")
but I get multiple same identifiers printed.
this is my code for sorting separators, identifiers and constants
lexicalClass = file.readlines()
for lex in lexicalClass:
if len(re.findall('s+', lex)):
separators.extend(re.findall('s+', lex))
a_string = lex.split()
for word in a_string:
if len(re.findall(r"(?!S+)[+-]? *(?:d+(?:.d*)?|.d+)(?:[eE][+-]?
d+)?(?!S+)", word)):
constants.extend(re.findall(r"(?!S+)[+-]? *(?:d+(?:.d*)?|.d+)
(?:[eE][+-]?d+)?(?!S+)", word))
newList = re.findall('S+', lex)
for element in newList:
if len(re.findall('[a-z]+[0-9]+|[a-z]+', element)):
identifiers.extend(re.findall('[a-z]+[0-9]+|[a-z]+', element))
EDIT:
I added this code:
tokens = identifiers.copy()
tokens.extend(operators)
tokens.extend(separators)
tokens.extend(keywords)
tokens.extend(constants)
for i in range(len(lexicalClass) - 1):
print(f'Line {i}: ')
for element in tokens:
if element in operators:
print({element}, 'operators')
elif element in identifiers:
print({element}, 'identifers')
elif element in separators:
print({element}, 'identifers')
print("
")
But im getting this output
Line 0:
{'a273'} identifers
{'i'} identifers
{'aifj'} identifers
{'variable'} identifers
{'+'} operators
{'+'} operators
{'++'} operators
{'/'} operators
{'='} operators
{'='} operators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{'
'} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{'
'} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{'
'} separators
Line 1:
{'a273'} identifers
{'i'} identifers
{'aifj'} identifers
{'variable'} identifers
{'+'} operators
{'+'} operators
{'++'} operators
{'/'} operators
{'='} operators
{'='} operators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{'
'} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{'
'} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{' '} separators
{'
'} separators
I want to print line by line and its element, but im getting in every line whole file printed and also elements are not sorted (all identifiers are printed first, then operators...)