Hi.
Interesting problem -- a few thoughts.
For timings, I used a 1 GB text file, about 15 M lines, with many duplicates (it is a large number of copies of the text of a novel.)
1) I didn't see any requirement that the file be kept in the original order, so one solution is to sort the file. On my system, sort processed the file using 7 keys in under a minute. An option to remove duplicates about halved the time (many duplicates did not need to get written out).
If the original ordering is needed, one could add a field containing the line number, which could then be used as an additional key, so the final output would be in the original order. You might be able to get by with a single sort, but if 2 sorts would be needed, they could be in a pipeline, so that the system would handle the connections, and no large intermediate file need be directly used.
2) The running out of memory in awk suggests that awk doesn't go beyond real memory, that your system does not use virtual memory, or that you have no swap space -- or similar reasons along those lines. I used perl to keep an in-memory hash of MD5 checksums of the lines. I did see some paging near the end -- the test system has 3 GB of real memory. I arranged for the file to have an additional field making every line unique, so that I had 15 M entries. I did no more processing except for checking the counts of the hashes -- the entire process took about 2.5 minutes of real time.
The advantage of using a checksum + line number is that if the hash does not fit into memory (for whatever reason), the derived data (checksum + line number) can be written out, and the resulting file can be sorted. The duplicate checksum lines will be be in order and the file can be processed to obtain the line numbers of the originals as well as subsequent duplicates. These line numbers can then be used with other utilities, say sed, to be displayed or to refine the original file.
3) You mentioned perl module Tie::File. For small files, this might be an useful choice, depending on what you wanted to do. Simply opening my test file took about 100 seconds. I tested reading the file and writing to /dev/null. The "normal" perl "<>" operator took about half a minute of wall-clock time. Using Tie::File took about 55 minutes -- 2 orders of magnitude slower -- reading straight through, with no other processing. I don't have a lot of experience with Tie::File, but from what I have seen so far, I would avoid it with applications like this where you probably need to look at every line in the file.
Good luck ... cheers, drl