Experts and All,
Hello !
I am trying to fabricate a simple shell script in python that has taken me almost 5 hours to complete. I am using python 3.6.
So, I am trying to read a file, parse the log file and trying to answer this basic question of how many GET's and how many POST's are there and sort them in the ascending order.
I pieced everything together here and it works fine but I know for sure that I have unnecessarily made it complicated than it is supposed to be.
- Why should I push the data into list (wordstring) ?
- Why is that I am not able to parse out if it is a get or post method from httpd log file ?
Please, show me the way and if you can, explain it to me in detail or just point me to the correct documentation site atleast.
manoharmahostav@ma-host:~/files$ python log_file_analyse.py
Stuff
GET: 1595922
PUT: 30
POST: 26
manoharmahostav@ma-host:
manoharmahostav@ma-host:~/files$ cat log_file_analyse.py
#!/usr/bin/env python
import collections
from collections import Counter
from collections import defaultdict
#fname = 'testfile.txt'
fname = 'apache.log'
wordstring = []
c = collections.Counter()
with open(fname, 'r') as fh:
for line in fh:
if len(line.strip()):
splitlines = line.split('"')[1]
another = splitlines.split()[0]
wordstring.append(another)
c = Counter(wordstring)
print("Stuff")
for letter, count in c.most_common(30):
print( '%s: %7d' % (letter, count))
manoharmahostav@ma-host:~/files$
manoharmahostav@ma-host:~/files$ head testfile.txt
64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523
64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
---------- Post updated at 02:41 AM ---------- Previous update was at 12:41 AM ----------
After due efforts, here is what I have and this looks a bit cleaner but this is not any faster than the previous version that I posted in here.
Any help in getting a performance improvement would be much appreciated.
Sincerely,
Manohar.
manoharmahostav@ma-host:~/files$ cat abc.py
#!/usr/bin/env python
import collections
from collections import Counter
somelist = []
with open('apache.log', 'r') as f:
for line in f:
splitlines = line.split('"')
pat = splitlines[1]
pat2 = pat.split(' ')[0]
somelist.append(pat2)
a = Counter(somelist)
print('Most Common:')
for d, b in a.most_common(10):
print('%s: %10d' %(d, b))
manoharmahostav@ma-host:~/files$
manoharmahostav@ma-host:~/files$ python abc.py
Most Common:
GET: 1595922
PUT: 30
POST: 26