Parse out known messages from a log file

I am looking for a script to do the following. I have a large log file that contains hundreds of warnings, a lot of which can be ignored. The tool doesn't allow me to suppress it, so I like to parse it out from the log file and isolate just the new messages/warnings, based on an exception file.

I have three files,

  • log1.txt - first log file generated by the tool.
  • log1.exp which is an exception list file that I will create by cutting and pasting items from log1.txt, that can be exempted/ignored from future log files.
  • log2.txt will be the new log file containing all or some of the items in the log1.txt and possibly a bunch of new warnings, the ones that I am after.

%>cat log1.txt (first log file)
Date: Sep 11, 2008 10:30 PM
Tool: Micorsoft SQL Server

The signal "/RAM12/dout<15>" is sourceless and has been removed.

[LEFT]The signal "/RAM22/dout<14>" is sourceless and has been removed.

[LEFT]The signal "/RAM23/dout<13>" is sourceless and has been removed.

[LEFT] [LEFT]The signal "/RAM24/dout<12>" is sourceless and has been removed.
Sourceless block
"U0/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U1/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U2/blk_generator/valid/has_B/dout1" (SFF) removed.
Sourceless block
"U3/blk_generator/valid/has_B/dout1" (SFF) removed.
...
..
and a whole bunch of other messages.

%>cat log2.txt will also display similar text with few changes like the date and any messages that were never found after the first run.

%>cat log1.exp (Exception list manually created using text from log1.txt)
The signal "/RAM12/dout<15>" is sourceless and has been removed.
The signal "/RAM22/dout<14>" is sourceless and has been removed.
The signal "/RAM23/dout<13>" is sourceless and has been removed.
The signal "/RAM24/dout<12>" is sourceless and has been removed.

>cleanScript.pl log2.txt log1.exp
Date: Sep 13, 2008 11:45AM
Tool: Micorsoft SQL Server

Sourceless block
"U0/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U1/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U2/blk_generator/valid/has_B/dout1" (SFF) removed.
Sourceless block
"U3/blk_generator/valid/has_B/dout1" (SFF) removed.
...
..
and a whole bunch of other messages.

[/LEFT]
[/LEFT]
[/LEFT]
[/LEFT]
Any help would be greatly appreciated. Perl or AWk would be great.

Basically the cleanScript.pl script would parse through log2.txt, delete any entries that are listed in log1.exp and write out a new file. Please note that some messages are wrapped between multiple lines, like the "Sourceless blok.." warning.

Can anyone please help. If you need more clarification, please let me know. I hope I am not in the wrong forum for this type of questions.

maybe I'm over simplifying here...but couldn't you just do something like

egrep -vf what.you.want.excluded file.with.whole.log > newfile.with.trimmed.list

Unfortunately that won't do it. That, as you said is over simplified. I tried it, but gets an empty file.

d@DeCobox-Micro ~
$ cat > orig
The signal "/RAM12/dout<15>" is sourceless and has been removed.
The signal "/RAM22/dout<14>" is sourceless and has been removed.
The signal "/RAM23/dout<13>" is sourceless and has been removed.
The signal "/RAM24/dout<12>" is sourceless and has been removed.
Sourceless block
"U0/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U1/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U2/blk_generator/valid/has_B/dout1" (SFF) removed.
Sourceless block
"U3/blk_generator/valid/has_B/dout1" (SFF) removed

d@DeCobox-Micro ~
$ cat > excep
The signal "/RAM12/dout<15>" is sourceless and has been removed.
The signal "/RAM22/dout<14>" is sourceless and has been removed.
The signal "/RAM23/dout<13>" is sourceless and has been removed.
The signal "/RAM24/dout<12>" is sourceless and has been removed.

d@DeCobox-Micro ~
$ egrep -vf excep orig > happydays

d@DeCobox-Micro ~
$ cat happydays 
Sourceless block
"U0/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U1/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U2/blk_generator/valid/has_B/dout1" (SFF) removed.
Sourceless block
"U3/blk_generator/valid/has_B/dout1" (SFF) removed

d@DeCobox-Micro ~
$ 

That is strange...
I get an empty happydays file. I am running on Linux RHE4.0. Would that make a difference.

zh0tx-.../temp < 114> cat excep 
The signal "/RAM12/dout<15>" is sourceless and has been removed.
The signal "/RAM22/dout<14>" is sourceless and has been removed.
The signal "/RAM23/dout<13>" is sourceless and has been removed.
The signal "/RAM24/dout<12>" is sourceless and has been removed.

zh0tx-.../temp < 115> cat orig 
The signal "/RAM12/dout<15>" is sourceless and has been removed.
The signal "/RAM22/dout<14>" is sourceless and has been removed.
The signal "/RAM23/dout<13>" is sourceless and has been removed.
The signal "/RAM24/dout<12>" is sourceless and has been removed.
Sourceless block
"U0/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U1/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U2/blk_generator/valid/has_B/dout1" (SFF) removed.
Sourceless block
"U3/blk_generator/valid/has_B/dout1" (SFF) removed

zh0tx-.../temp < 116> egrep -vf excep orig > happydays
zh0tx-.../temp < 117> cat happydays
zh0tx-.../temp < 118>

Well now I'm interested to find out what the deal is. My first post was in Cygwin...but to be sure I just VPN'd to a Solaris system & a Linux system at work and got the same results. I look forward to finding out what's up.

Yes, that is weird. This is what I get with uname.

%>uname -a 
Linux zh0tx.xyz.com 2.6.9-67.EL.bz439580.16smp #1 SMP Fri Jul 25 03:21:36 EDT 2008 i686 i686 i386 GNU/Linux

I think a perl script that would open exception list, take each line and then parse through the "orig" and remove matching entries is what we need.

(12:39:29\[ddecosta@S.Man)
[~]$ uname -a
SunOS king 5.10 SunOS_Development sun4u sparc SUNW,Sun-Fire-V890

pdt:/home/pdt --> uname -a
Linux showrunner 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686 i386 GNU/Linux

pdt:/home/pdt --> egrep --v
egrep (GNU grep) 2.5.1

I'm sure a perl script would work...but it just seems like a canon on a mosquito. I may just be bitter because I never made it past the hello world stage of learning perl though...

I got someone else to try it and it worked for them. So I tried on that exact same machine, but, no, it wouldn't do it for me. So then the only difference that I can see is that they are using CSH whereas I am running tcsh. I wonder if that has anythign to do with it.

egrep --v
egrep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Now that we are both using the same version of egrep, the only other diff I see is that I have more patches on my machine than both you and the other person who got it working. May be it is the patch that is messing it up. Hmmmm... I would be surprised if a patch mess up a program like grep!!

#!/bin/perl -w

################################
use strict;

open FILE1,    '< ignorelist.txt' or die $!;
open OUTFILE,  '>>clean_report.txt' or die $!;

while (<FILE1>)
{
  $FILE1hash{$_}=1;
}
close(<FILE1>);


open FILE2,    '< new_report.txt' or die $!;

while (<FILE2>)
{
  print OUTFILE $_ unless defined($FILE1hash{$_});
}

close FILE2;
close OUTFILE; 

I got a small perl script almost working. But seeing some errors... don't know what yet!

Work's for me:

$ grep -vf filter file
Date: Sep 11, 2008 10:30 PM
Tool: Micorsoft SQL Server

Sourceless block
"U0/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U1/blk_generator/valid/has_B/dout0" (SFF) removed.
Sourceless block
"U2/blk_generator/valid/has_B/dout1" (SFF) removed.
Sourceless block
"U3/blk_generator/valid/has_B/dout1" (SFF) removed.
$ grep --v
grep (GNU grep) 2.5.1-FreeBSD

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ echo $SHELL
/bin/tcsh

I have no idea what's going on with my system. It just wouldn't do it... strange indeed.. the only thing I can think of is the Linux patch level..