As a newbie to Python, I am trying to write a script in which is will add all the log files (*.log) from within a directory to a list[], open the files and search for an ip using a regex and single it out (appending the ip's to the list[]). So far, I have:
import re, os
def list_files()
content = []
for files in os.walk('var/www/html/data/customer/log'):
content.append(files)
return content
lfiles = list_files()
lfiles
file1.log
file2.log
file3.log
file4.log
lfiles[0]
file1.log
file2.log
file3.log
file4.log
At this point I would imagine I need to open these files and regex pulling the ips(this is the part that gets me) So maybe:
then add somewhere into the function:
regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",files)
and add whatever else I need to get this done.
Why add the filenames to a list? Why not just use the filenames, when you get them?
Many thanks for the reply. Can you show me an example?
I was trying to get fancy and as you can see, I dug myself into a hole
Maybe something like this:
import os, re
def list_files()
ips = []
for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
for file in files:
f=open(file, 'r')
lines=f.readlines()
regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)
f.close()
ips.append(file)
return ips
I made an adjustment that was recommended by someone else and this is what I get:
import os,re
def list_files():
ips = []
for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
for file in files:
f=open(file, 'r')
lines=f.readlines()
regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)
f.close()
ips.append(regexp)
return ips
and when I use the function I get this error:
list_files()
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
/home/Python/banned-scraper/<ipython-input-9-52cf17baf819> in <module>()
----> 1 list_files()
/home/Python/banned-scraper/<ipython-input-8-9070553f1ea7> in list_files()
5 for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
6 for file in files:
----> 7 f=open(file, 'r')
8 lines=f.readlines()
9 regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)
IOError: [Errno 2] No such file or directory: 'eval7577:1.18595.dbg'
???
for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
for file in files:
regexp = re.findall(r"10.7.0.145", open(file, "r").read())
print " Here is whats inside of %s = %s" % (regexp,file)
....:
Here is whats inside of [] = file3
Here is whats inside of [] = file6
Here is whats inside of [] = file7
Here is whats inside of [] = file1
Here is whats inside of ['10.7.0.145'] = file9
Here is whats inside of [] = file5
Here is whats inside of [] = file8
Here is whats inside of [] = file10
Here is whats inside of [] = file2
Here is whats inside of [] = file4
I made some progress but cant figure out how to just print the file containing the regular expression. So it prints something like this:
Here is whats inside of ['10.7.0.145'] = file9
only
---------- Post updated at 05:38 PM ---------- Previous update was at 03:35 PM ----------
I figured it out with some help. re.findall returns a list. So I need to only print
[0]
which gets the first (and only) item in matches. The if-statement in place, the print line will only be run if matches is non-empty.
for subdir, dirs, files in os.walk('.'):
for file in files:
matches = re.findall(r"10.7.0.145", open(file).read())
if matches:
print " I found what I was looking for %s = %s" % (file,matches[0])
returns:
Here is whats inside of file9 = 10.7.0.145
objective complete.
1 Like