Python Reading Individual files and Regex through them

metallica1973 · November 5, 2013, 4:41pm

As a newbie to Python, I am trying to write a script in which is will add all the log files (*.log) from within a directory to a list[], open the files and search for an ip using a regex and single it out (appending the ip's to the list[]). So far, I have:

import re, os
def list_files()
content = []
    for files in os.walk('var/www/html/data/customer/log'):
        content.append(files)
      return content
lfiles = list_files()
lfiles
file1.log
file2.log
file3.log
file4.log

lfiles[0]
file1.log
file2.log
file3.log
file4.log

At this point I would imagine I need to open these files and regex pulling the ips(this is the part that gets me) So maybe:

then add somewhere into the function:

regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",files)

and add whatever else I need to get this done.

Corona688 · November 5, 2013, 4:46pm

Why add the filenames to a list? Why not just use the filenames, when you get them?

metallica1973 · November 5, 2013, 5:06pm

Many thanks for the reply. Can you show me an example?

I was trying to get fancy and as you can see, I dug myself into a hole

Maybe something like this:

import os, re

def list_files()
 ips = []
 for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)
        f.close()
        ips.append(file)
 return ips

metallica1973 · November 7, 2013, 10:28pm

I made an adjustment that was recommended by someone else and this is what I get:

import os,re

def list_files():
 ips = []
 for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)
        f.close()
        ips.append(regexp)
 return ips

and when I use the function I get this error:

 list_files()
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
/home/Python/banned-scraper/<ipython-input-9-52cf17baf819> in <module>()
----> 1 list_files()

/home/Python/banned-scraper/<ipython-input-8-9070553f1ea7> in list_files()
      5  for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
      6     for file in files:
----> 7         f=open(file, 'r')
      8         lines=f.readlines()
      9         regexp = re.findall(r"(?<=\:\ )\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}",file)

IOError: [Errno 2] No such file or directory: 'eval7577:1.18595.dbg'

???

metallica1973 · November 19, 2013, 5:38pm

for subdir, dirs, files in os.walk('var/www/html/data/customer/log'):
    for file in files:
        regexp = re.findall(r"10.7.0.145", open(file, "r").read())
        print " Here is whats inside of %s = %s" % (regexp,file)
   ....:         
  Here is whats inside of [] = file3
 Here is whats inside of [] = file6
 Here is whats inside of [] = file7
 Here is whats inside of [] = file1
 Here is whats inside of ['10.7.0.145'] = file9
 Here is whats inside of [] = file5
 Here is whats inside of [] = file8
 Here is whats inside of [] = file10
 Here is whats inside of [] = file2
 Here is whats inside of [] = file4

I made some progress but cant figure out how to just print the file containing the regular expression. So it prints something like this:

Here is whats inside of ['10.7.0.145'] = file9

only

---------- Post updated at 05:38 PM ---------- Previous update was at 03:35 PM ----------

I figured it out with some help. re.findall returns a list. So I need to only print

[0]

which gets the first (and only) item in matches. The if-statement in place, the print line will only be run if matches is non-empty.

for subdir, dirs, files in os.walk('.'):
    for file in files:
       matches = re.findall(r"10.7.0.145", open(file).read())
       if matches:
           print " I found what I was looking for %s = %s" % (file,matches[0])

returns:

Here is whats inside of file9 = 10.7.0.145

objective complete.