Python: Compare 2 word lists

Hi.

I am trying to write a Python programme that compares two different text files which both contain a list of words. Each word has its own line

worda
wordb
wordc

I want to compare textfile 2 with textfile 1, and if there's a word in textfile 2 that is NOT in textfile 1, I want to print that word.

Here's what I've got so far (note that I'm an absolute beginner...)

def lookup(filename):
     f=file("textfile1")
     fileone=f.read
     g=file("textfile2")
     filetwo=g.read
     for line in filetwo:
     if line not in fileone:
          print line

Any suggestions are appreciated :slight_smile:

Thanks in advance,

Kat

---------- Post updated at 03:51 PM ---------- Previous update was at 02:57 PM ----------

I changed my code, andnow it actually does something...not what I want though:

fileone="./fileone.data"
filetwo="./filetwo.data"
words="./words.data"

with open(filetwo,'r') as a:
    with open(fileone,'r') as f:
        with open(words,'w') as w:
            for line in a:
                if line not in f:
                    w.write(line)

I get the same words in the created word list that I already have in filetwo...any suggestions?

Caveat: I have not programmed in python in quite a while.

As I recall, python supports a set data type and sets can be constructed from iterables (which include files).

Putting each list in a set, the answer you seek is the result of s2 - s1 , where s1 is the set of lines (just words in this case) in textfile 1 and s2 corresponds to textfile 2.

This may not be a suitable approach if the lists are very very very very large (too large to be stored in memory), but that is seldom the case.

Regards,
Alister

Look at PLEAC in Arrays in the section: Finding Elements in One Array but Not Another

From the section of the webpage that you recommended:

#    build lookup table
for item in b_list:
    seen[item] = 1

#    find only elements in a_list and not in b_list
for item in a_list:
    if not item not in seen:
        # it's not in 'seen', so add to 'aonly'
        aonly.append(item)

That code does the exact opposite of what it's supposed to do. One of the "not" boolean operators needs to go. For the sake of readability, preferably the first one.

Feel free to pass along a patch to the authors if you like.

Regards,
Alister

Hey, it is under the comment:
# DON'T DO THIS.

I don't think the "DON'T DO THIS" is intended to indicate that the code gives an erroneous result, but rather to indicate that it's not a very pythonic approach. The fact that the very next "DON'T DO THIS" section gives the correct result would seem to support my supposition.

I point this out mostly for the original poster. It's frustrating (especially for a newbie) to get hung up on a code sample only to find out later, after much facepalming, that it was the teaching material that was the problem.

Regards,
Alister

In any case, these are examples of bad code.

But yes, I have to agree with you - this "bad" code is worse than it should be.