I have these lines
5.10.80.69 - - [21/Jun/2019:15:46:20 -0700] "PATCH /niches/back-end HTTP/2.0" 406 15834
11.57.203.39 - carroll8889 [21/Jun/2019:15:46:21 -0700] "HEAD /visionary/cultivate HTTP/1.1" 404 15391
124.137.187.175 - - [21/Jun/2019:15:46:22 -0700] "DELETE /expedite/exploit/cultivate/web-enabled HTTP/1.0" 403 2606
203.36.55.39 - collins6322 [21/Jun/2019:15:46:23 -0700] "PATCH /efficient/productize/disintermediate HTTP/1.1" 504 13377
175.5.52.40 - - [21/Jun/2019:15:46:24 -0700] "POST /real-time HTTP/1.1" 200 2660
232.220.131.214 - - [21/Jun/2019:15:46:25 -0700] "GET /wireless/matrix/synergistic/expedite HTTP/1.1" 205 15081
87.234.209.125 - labadie6990 [21/Jun/2019:15:46:26 -0700] "GET /unleash/aggregate HTTP/2
and I need to put them in an array like this:
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"}
This is what I have done:
import re
def logs():
with open("assets/logdata.txt", "r") as file:
logdata = file.read()
return logdata
partes = [
r'(?P<host>S+)', # host %h
r'S+', # indent %l (unused)
r'(?P<user>S+)', # user %u
r'[(?P<time>.+)]', # time %t
r'"(?P<request>.*)"', # request "%r"
r'(?P<status>[0-9]+)', # status %>s
r'(?P<size>S+)', # size %b (careful, can be '-')
r'"(?P<referrer>.*)"', # referrer "%{Referer}i"
r'"(?P<agent>.*)"', # user agent "%{User-agent}i"
]
pattern = re.compile(r's+'.join(partes)+r's*')
log_data = []
for line in logs():
log_data.append(pattern.match(line).groupdict())
print (log_data)
But I have this errror:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-029948b6e367> in <module>
23 # Get components from each line of the log file into a structured dict
24 for line in logs():
---> 25 log_data.append(pattern.match(line).groupdict())
26
27
AttributeError: 'NoneType' object has no attribute 'groupdict'
This error is obviusly because the regex is wrong, but not sure why, the code is taken from here:
https://gist.github.com/sumeetpareek/9644255
Update:
import re
def logs():
with open("assets/logdata.txt", "r") as file:
logdata = file.read()
return logdata
regex="^(S+) (S+) (S+) [([w:/]+s[+-]d{4})] "(S+)s?(S+)?s?(S+)?" (d{3}|-) (d+|-)s?"?([^"]*)"?s?"?([^"]*)?"?$"
log_data = []
for line in logs():
m = pattern.match(line)
log_data.append(re.findall(regex, line).groupdict())
print (log_data)
But I get this error:unexpected character after line continuation character
Update 2:
when adding the items to a dictionary, the items must arrive in this format:
assert len(logs()) == 979
one_item={'host': '146.204.224.152',
'user_name': 'feest6811',
'time': '21/Jun/2019:15:45:24 -0700',
'request': 'POST /incentivize HTTP/1.1'}
assert one_item in logs(), "Sorry, this item should be in the log results, check your formating"
question from:
https://stackoverflow.com/questions/65882119/whats-missing-this-regex-to-match-the-lines-of-apache-logs