Hi Folks,
I have a apache log file that has double entries (however not all lines appear twice).
How can I delete automatically the first line of a double entry?
Your help is greatly appreciated.
Thanks,
Klaus
Here is what the log file looks like
217.81.190.164 - - [28/Aug/2002:00:16:33 +0200] "GET /rmg/w4w/1000689.htm HTTP/1.1" 200 2409
217.81.190.164 - - [28/Aug/2002:00:16:33 +0200] "GET /rmg/w4w/1000689.htm HTTP/1.1" 200 2409 "http://www.opusforum.org/rmg/w4w/ " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
217.81.190.164 - - [28/Aug/2002:00:17:01 +0200] "GET /rmg/vec/ HTTP/1.1" 200 2631
217.81.190.164 - - [28/Aug/2002:00:17:01 +0200] "GET /rmg/vec/ HTTP/1.1" 200 2631 "http://www.opusforum.org/ " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
217.81.190.164 - - [28/Aug/2002:00:17:03 +0200] "GET /rmg/vec/1000868.htm HTTP/1.1" 200 2386
217.81.190.164 - - [28/Aug/2002:00:17:03 +0200] "GET /rmg/vec/1000868.htm HTTP/1.1" 200 2386 "http://www.opusforum.org/rmg/vec/ " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
213.23.52.237 - - [28/Aug/2002:00:17:10 +0200] "GET / HTTP/1.0" 200 16327
How about:
uniq <inputfile >outputfile
auswipe
September 16, 2002, 10:42am
3
In this situation, I like to use a Perl hash for doing the dirty work for me.
Something like this:
#!/usr/bin/perl
open(LOG, "myLogFile") || die "$!";
my %logHash;
while ($inputLine = <LOG>) {
if (!exists($logHash{$inputLine})) {
$logHash{$inputLine} = 1;
print "$inputLine";
};
};
That should remove the dupe entries. Just redirect the output to a new log.
auswipe
September 16, 2002, 10:45am
4
Oh sure, do it the eeaaasssy way!
Hi Folks,
thanks a lot for your suggestions. Unfortunately, both suggestions don't work.
The "uniq" solution needs a "-w 50" in order to come up with the double entry. However, it gives me the first line but I need the second (the line with add. information).
The perl script doesn't give me the result because it compares line by line. But the lines are not really "exact" duplicates (only the first 50 characters or so).
Any refinements, so the solution works? I am sure we are close
Thanks
Klaus
I made it
here is what worked for me:
perl -e 'print reverse <>' logfile|uniq -w 50|perl -e 'print reverse <>' >logfile.done
so first, the logfile is inverted (by lines) then the dupes are removed and finaly we do an invert again.
The inversion is needed in order to have the first of a duplicate line pair removed.
Thanks to your contributions folks. This pointed me into the right direction.
Klaus
uniesh
March 13, 2009, 3:54am
7
uniq -c <file1 >file2 would give you the number duplicate entries with a unique entry appending to the 2nd file.
Regards,
uniesh
try to use tail -r before using uniq command...