Parse Logfile output variable

Ikon · December 9, 2008, 5:12pm

<SUMMARY filecount_excluded="0" dirbytes_sent="3367893" dirbytes_hashcache="13275664" ..and so on..>
<session numthreads="1" type="avtarbackup" ndispatchers="1" ..and so on..><host numprocs="4" 
speed="900" osuser="root" name="ashsux01" memory="24545" /><build time="11:04:53" msgversion="13-10" 
appname="avtar" ..and so on../><dirstats numfiles="193129" numbytes="1461473417216" numdirs="0" />
</session><errorsummary exitcode="0" errors="0" warnings="0" fatals="0" />
</SUMMARY>

I would like to output the following:

SUMMARY_filecount="0"
SUMMARY_dirbytes="3367893"
...
SUMMARY_session_numthreads="1"
SUMMARY_session_type="avtarbackup" 
...
SUMMARY_session_build_time="11:04:53" 
SUMMARY_session_build_msgversion="13-10" 
...
SUMMARY_session_dirstats_numfiles="193129" 
SUMMARY_session_dirstats_numbytes="1461473417216" 
...
SUMMARY_errorsummary_exitcode="0" 
SUMMARY_errorsummary_errors="0" 
...

There could be any number of tags within tags and any number of variables.

I tried doing some searching but wasnt sure what exactally to search for.

joeyg · December 9, 2008, 5:26pm

> cat file103 | sed "s/<\/session/~&/g" | tr "~" "\n"

Which would separate data onto each own line,
then grep for the lines you want to see?

[Sorry I couldn't test, but your sample data did not have the appropriate tags or session markers for me to truly analyze.]

Ikon · December 9, 2008, 5:32pm

Here is an actual snip from the log:

<SUMMARY filecount_excluded="0" dirbytes_sent="3367893" dirbytes_hashcache="13275664"><session numthreads="1" 
type="avtarbackup" ndispatchers="1"><host numprocs="4" speed="900" osuser="root" name="ashsux01" memory="24545" />
<build time="11:04:53" msgversion="13-10" appname="avtar"/><dirstats numfiles="193129" numbytes="1461473417216" 
numdirs="0" /></session><errorsummary exitcode="0" errors="0" warnings="0" fatals="0" /></SUMMARY>

The problem im running into is <SUMMARY has its own vars then within SUMMARY it has SESSION and within SESSION it has its vars and then DIRSTATS, BUILD.

like this:

<SUMMARY VARS...>
	<SESSION VARS...>
		<BUILD VARS.../>
		<DIRSTATS VARS.../>
	</SESSION>
	<ERRORSUMMARY VARS...\>
</SUMMARY>

The actual log file has many others within summary and some not called summary they might be called something completely different than SUMMARY or SESSION.

vidyadhar85 · December 9, 2008, 5:58pm

then i think you have to do it with two or more awk
i tried some thing but not getting right hope you work more on it

awk '/^<SUMMARY/{print $0}' file|awk -F"[=_]" 'BEGIN{RS=" "}$0!="<SUMMARY"{print "SUMMARY_"$1"="$3}'

awk '/^<session/{print $0}' file|awk -F"[= ]" 'BEGIN{RS=" "}$0!="<session"{print "SUMMARY_session_"$1"="$2}'

Ikon · December 9, 2008, 6:03pm

Those are pretty goot starting points. They are pretty close.

I will be working on it all day tomorrow, when I get a working script i will be sure to share.

vidyadhar85 · December 9, 2008, 6:11pm

will be looking forward for the answer..
regards,
vidya

Ikon · December 11, 2008, 4:35pm

Well I decided to write it in perl...

Im having some problems:

If you compair the log with the output its skipping the first item, ie CUSTOMER_join, ADDRESS_street, TODAY_date.....

This is not the actual log file it will be reading it just a quick on I put together. The actual log if faily large.

 
# cat testfile.log
<CUSTOMER join="1/1/2008" last="12/20/2008">
<NAME name="John" lname="Smith">
<ADDRESS street="main" number="123" city="Orlando" state="Florida" zip="12345" />
</DATA>
</CUSTOMER>
<TODAY date="12/11/2008" time="12:12:12" />

# perl readlog.pl testfile.log
CUSTOMER_last: 12/20/2008
NAME_lname: Smith
ADDRESS_number: 123
ADDRESS_city: Orlando
ADDRESS_state: Florida
ADDRESS_zip: 12345
TODAY_time: 12:12:12

 
# cat readlog.pl
$filename = $ARGV[0];
open FILE, $filename or die $!;
while (<FILE>) {
        push(@fields, defined($1) ? $1:$3)
        while m/([^<>]+)/g;
}
close(FILE);
$head="";
foreach (@fields) {
        if ($_ =~ /^([A-Za-z0-9]+) /) {
                $line = $_;
                ($header,$data) = split(/ /, $line, 2);
                if ( $head == "" ) {
                        $head = $header;
                } else {
                        $head = $head."_".$header;
                }
                @subs = split(/" /,$data);
                for($i = 0; $i < @subs; $i++) {
                        ($str, $strdata) = split (/=/,$subs[$i]);
                        $strdata =~ s/^"//;
                        $strdata =~ s/"$//;
                        $head =~ s/_{2,}/_/;
                        if ($str !~ /\//) {
                                print $head."_".$str.": ".$strdata."\n";
                        }
                }
                if ($data =~ /\/$/) {
                        @h = split(/_/,$head);
                        $max = @h - 1;
                        $head =~ s/$h[$max]// ;
                }
        } else {
                if ($_ =~ /^\//) {
                         @h = split(/_/,$head);
                        $max = @h - 1;
                        $head =~ s/$h[$max]// ;
                }
        }
}

Ikon · December 11, 2008, 4:46pm

I started $i at 1 instead of 0.

its fixed now.

Ikon · December 11, 2008, 5:52pm

Found one more error..

here is the final:

# cat test.log
<CUSTOMER join="1/1/2008" last="12/20/2008"><NAME name="John" lname="Smith">
<ADDRESS street="main" number="123" city="Orlando" state="Florida" zip="12345" />
</DATA></CUSTOMER><TODAY date="12/11/2008" time="12:12:12" />

# cat readlog.pl
$filename = $ARGV[0];
open FILE, $filename or die $!;
while (<FILE>) {
        push(@fields, defined($1) ? $1:$3)
        #while m/"([^"\\]*(\\.[^"\\]*)*)"|([^ ]+)/g;
        while m/([^<>]+)/g;
}

close(FILE);
$head="";
foreach (@fields) {
        if ($_ =~ /^([A-Za-z0-9]+) /) {
                $line = $_;
                ($header,$data) = split(/ /, $line, 2);
                if ( length ($head) < 3 ) {
                        $head = $header;
                } else {
                        $head = $head."_".$header;
                }
                @subs = split(/" /,$data);
                #print "==$data==";
                for($i = 0; $i < @subs; $i++) {
                        ($str, $strdata) = split (/=/,$subs[$i]);
                        $strdata =~ s/^"//;
                        $strdata =~ s/"$//;
                        $head =~ s/_{2,}/_/;
                        if ($str !~ /\//) {
                                print $head."_".$str."='".$strdata."'\n";
                        }
                }
                if ($data =~ /\/$/) {
                        @h = split(/_/,$head);
                        $max = @h - 1;
                        $head =~ s/$h[$max]// ;
                }
        } else {
                if ($_ =~ /^\//) {
                        @h = split(/_/,$head);
                        $max = @h - 1;
                        $head =~ s/$h[$max]// ;
                }
        }
}

# perl readlog.pl test.log
CUSTOMER_join='1/1/2008'
CUSTOMER_last='12/20/2008'
CUSTOMER_NAME_name='John'
CUSTOMER_NAME_lname='Smith'
CUSTOMER_NAME_ADDRESS_street='main'
CUSTOMER_NAME_ADDRESS_number='123'
CUSTOMER_NAME_ADDRESS_city='Orlando'
CUSTOMER_NAME_ADDRESS_state='Florida'
CUSTOMER_NAME_ADDRESS_zip='12345'
TODAY_date='12/11/2008'
TODAY_time='12:12:12'

vidyadhar85 · December 11, 2008, 5:54pm

Oh great

summer_cherry · December 12, 2008, 1:17am

hi Below perl may help you some

#! /usr/bin/perl
undef $/;
open FH,"<a.txt";
$str=<FH>;
$str=~tr/\n//d;
while($str=~m/<(.*?)>/){
	my @arr=split(" ",$1);
	if($#arr==0){
		$pre=substr($pre,0,rindex($pre,"_"));
		$str=$';
		next;
	}
	$pre.=($pre)?"_".$arr[0]:$arr[0];
	#print "\n",$pre,"----->\n\n";
	for($i=1;$i<=$#arr;$i++){
		if(index($arr[$i],"/")!=-1){
			$arr[$i]=substr($arr[$i],0,index($arr[$i],"/"));
		}
		print $pre."_".$arr[$i]."\n";
	}
	if (index($1,"/")!=-1){
		$pre=substr($pre,0,rindex($pre,"_"));
	}
	$str=$';
	print "\n";
}
close FH;

Ikon · December 12, 2008, 9:37am

Some good ideas in there but it doesnt work correctly:

# cat test.log
<CUSTOMER join="1/1/2008" last="12/20/2008"><NAME name="John" lname="Smith">
<ADDRESS street="main" number="123" city="Orlando" state="Florida" zip="12345" />
</DATA></CUSTOMER><TODAY date="12/11/2008" time="12:12:12" />

CUSTOMER_join="1
CUSTOMER_last="12

CUSTOME_NAME_name="John"
CUSTOME_NAME_lname="Smith"

CUSTOME_NAME_ADDRESS_street="main"
CUSTOME_NAME_ADDRESS_number="123"
CUSTOME_NAME_ADDRESS_city="Orlando"
CUSTOME_NAME_ADDRESS_state="Florida"
CUSTOME_NAME_ADDRESS_zip="12345"
CUSTOME_NAME_ADDRESS_

CUSTOM_TODAY_date="12
CUSTOM_TODAY_time="12:12:12"
CUSTOM_TODAY_