To parse the file you could define a grammar that describes your input format and use it to generate a parser.
There are many language parsers in Python. For example, you could use Grako that takes grammars in a variation of EBNF as input, and outputs memoizing PEG parsers in Python.
To install Grako, run pip install grako
.
Here's grammar for your format using Grako's flavor of EBNF syntax:
(* a file is zero or more records *)
file = { record }* $;
record = name '=' value ';' ;
name = /[A-Z][a-zA-Z0-9.]*/ ;
value = object | integer | string ;
(* an object contains one or more records *)
object = '{' { record }+ '}' ;
integer = /[0-9]+/ ;
string = '"' /[^"]*/ '"';
To generate parser, save the grammar to a file e.g., Structured.ebnf
and run:
$ grako -o structured_parser.py Structured.ebnf
It creates structured_parser
module that can be used to extract the student information from the input:
#!/usr/bin/env python
from structured_parser import StructuredParser
class Semantics(object):
def record(self, ast):
# record = name '=' value ';' ;
# value = object | integer | string ;
return ast[0], ast[2] # name, value
def object(self, ast):
# object = '{' { record }+ '}' ;
return dict(ast[1])
def integer(self, ast):
# integer = /[0-9]+/ ;
return int(ast)
def string(self, ast):
# string = '"' /[^"]*/ '"';
return ast[1]
with open('input.txt') as file:
text = file.read()
parser = StructuredParser()
ast = parser.parse(text, rule_name='file', semantics=Semantics())
students = [value for name, value in ast if name == 'Student']
d = {'{0[Name.First]} {0[Name.Last]}'.format(s['PInfo']):
dict(School=s['School'], Zip=s['Address']['Zip'])
for s in students}
from pprint import pprint
pprint(d)
Output
{'Joe Burger': {'School': u'West High', 'Zip': 12345},
'John Smith': {'School': u'East High', 'Zip': 12346}}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…