Python Count Number Of Occurence

infinitydon · July 23, 2015, 6:52am

Hello,

I have a programming assignment to count number of occurrences of hours in particular file. Below is the code:

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
largest = None
fh = open(fname)

counts = dict()

test = list()
for line in fh:
    line = line.rstrip()
    if not line.startswith('From'): continue
    if line.startswith('From:'): continue
    words = line.split()
    h = words[5]
    h = h.split(':')
    test.append(h[0])    
    for w in tes:
     counts[w] = counts.get(w, 0 ) + 1
print test
print counts

Output of "print test" which I believe is ok:

['09', '18', '16', '15', '15', '14', '11', '11', '11', '11', '11', '11', '10', '10', '10', '09', '07', '06', '04', '04', '04', '19', '17', '17', '16', '16', '16']

Output of "print counts":

{'11': 111, '10': 42, '15': 47, '14': 22, '04': 24, '16': 31, '19': 6, '18': 26, '09': 39, '17': 9, '06': 10, '07': 11}

I don't know why "print counts" is giving a large number of counts?

---------- Post updated at 11:52 AM ---------- Previous update was at 09:53 AM ----------

I figured out the issue!

It was the nested FOR loop:

for w in tes: counts[w] = counts.get(w, 0 ) + 1
I have written the code again:

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
largest = None
fh = open(fname)

counts = dict()

test = list()
lst = list()
for line in fh:
    line = line.rstrip()
    if not line.startswith('From'): continue
    if line.startswith('From:'): continue
    words = line.split()
    h = words[5]
    h = h.split(':')
    test.append(h[0])
for w in test:
 counts[w] = counts.get(w, 0 ) + 1
print test
print counts

Thanks!

Aia · July 23, 2015, 11:33pm

A few things to consider, if you care about it.

if len(fname) < 1 : fname = "mbox-short.txt"

More Pythonic

if not fname:
    fname = "mbox-short.txt"

fh = open(fname)

You are not checking that the operation to open the fname was possible, also if it was possible, you should close it afterward.

line = line.rstrip()

No necessary since it does not interfere with what you are extracting.

    for w in tes:
     counts[w] = counts.get(w, 0 ) + 1

Indentation of counts[w] is not consistent with the rest of the code (it is less)

    if not line.startswith('From'): continue
    if line.startswith('From:'): continue

continue should be in the next line, indented. Furthermore, it is not necessary, since the only lines you want is From without the `:'

Here's is a rendition with those details changed.

fname = raw_input("Enter file name: ")
if not fname:
    fname = "mbox-short.txt"

# it checks that the file is open successfully and
# it will automatically close it when done.
with open(fname) as fh:
    counts = {}
    test = []
    for line in fh:
        if line.startswith('From '):
            h = line.split()[5].split(':')[0]
            test.append(h)
            if h not in counts:
                counts[h] = 1
            else:
                counts[h] += 1
print test
print counts

goon12 · October 27, 2015, 12:49pm

This would help: https://docs.python.org/2/library/collections.html\#collections.Counter