I'm writing a log collection / analysis application in Python and I need to write a "rules engine" to match and act on log messages.
It needs to feature:
- Regular expression matching for the message itself
- Arithmetic comparisons for message severity/priority
- Boolean operators
I envision An example rule would probably be something like:
(message ~ "program\[d+\]: message" and severity >= high) or (severity >= critical)
I'm thinking about using PyParsing or similar to actually parse the rules and construct the parse tree.
The current (not yet implemented) design I have in mind is to have classes for each rule type, and construct and chain them together according to the parse tree. Then each rule would have a "matches" method that could take a message object return whether or not it matches the rule.
Very quickly, something like:
class RegexRule(Rule):
def __init__(self, regex):
self.regex = regex
def match(self, message):
return self.regex.match(message.contents)
class SeverityRule(Rule):
def __init__(self, operator, severity):
self.operator = operator
def match(self, message):
if operator == ">=":
return message.severity >= severity
# more conditions here...
class BooleanAndRule(Rule):
def __init__(self, rule1, rule2):
self.rule1 = rule1
self.rule2 = rule2
def match(self, message):
return self.rule1.match(message) and self.rule2.match(message)
These rule classes would then be chained together according to the parse tree of the message, and the match() method called on the top rule, which would cascade down until all the rules were evaluated.
I'm just wondering if this is a reasonable approach, or if my design and ideas are way totally out of whack? Unfortunately I never had the chance to take a compiler design course or anything like that in Unviersity so I'm pretty much coming up with this stuff of my own accord.
Could someone with some experience in these kinds of things please chime in and evaluate the idea?
EDIT:
Some good answers so far, here's a bit of clarification.
The aim of the program is to collect log messages from servers on the network and store them in the database. Apart from the collection of log messages, the collector will define a set of rules that will either match or ignore messages depending on the conditions and flag an alert if necessary.
I can't see the rules being of more than a moderate complexity, and they will be applied in a chain (list) until either a matching alert or ignore rule is hit. However, this part isn't quite as relevant to the question.
As far the syntax being close to Python syntax, yes that is true, however I think it would be difficult to filter the Python down to the point where the user couldn't inadvertently do some crazy stuff with the rules that was not intended.
See Question&Answers more detail:
os