awk, sed or similar log repair help

I have a log file that for some reason, once or two time a month, line foods are missing.

This log is generated from vmstat everyminute. I dont know why sometimes it does this.

Each line in the log should have 18 columns separated by one or more spaces.

Good Log: (not actual log)

1 1 1 123456 1234 123 1 1 1 100
1 2 1 123 1234 1234 1 1 1 10
2 1 1 123 12345 123 1 1 1 3
1 1 3 123 1234 123 1 1 1 1

bad log

1 1 1 123456 1234 123 1 1 1 100 1 2 1 123 1234 1234 1 1 1 10 
2 1 1 123 12345 123 1 1 1 3
1 1 3 123 1234 123 1 1 1 1

There could be more than 2 rows on one line.

If I know there will ALWAYS be 18 columns how acan I quickly go though the log and IF there is more than 18 columns put a linefeed in and rewite the log?

thanks...

Are you trying to do this while the logfile is being written to by vmstat?

awk '{ for(i=1; i <=NF; i++) 
        {
        printf("%s ", $i)
        if ( !(i%18) ) { printf("\n") }
        }
     }'  logfile

try something like this - assuming the file is not being written to....

No this is done the following day. So it is not being written to anymore.

another caveat

there are headers every few rows. so I need to ignore those (anything with ANYthing other than 0-9.

But these rows with letters are NOT appended to any of the numeric only rows

There are alot of rows put together in this log so I get this:

In this particular row there are 360 fields. That is actally the most there will ever be.

1 1 0 617894 2987960 8 1 0 0 0 0 0 1069 4232 217 0 0 99
1 1 0 662317 2987993 9 2 0 0 0 0 0 1099 5219 254 0 0 99
1 1 0 624608 2987993 16 1 0 0 0 0 0 1191 89839 338 1 1 98
awk: Line     1     1     0    cannot have more than 199 fields.
 The input line number is 711. The file is vmstat.12.
 The source line number is 1.

Curious as to what vmstat command is throwing off the output altogether and what platform are you on?
From the error message it looks like a job for Perl as awk has an upper limit on the number of fields it can process.

This is running on HP-UX 11.11 and 11.23

This is what we run via cron:

vmstat 60 1440 > /path/to/vmstat.`date +%d`

This usually works fine. I dont know what is causing the linefeeds to disappear.

Part of the log will be correct then it will start having the problem and then it sometimes starts working correctly again.

I can do perl, was hoping awk or something would be very quick and easy.

I wrote a perl script and it works great, BUT...

I found the problem is with vmstat.. Sometime it does not write the CPU Idle time. to it will output only 17 columns instead of 18, so it never writes the line feed.

We are going to check with HP and see if there is other having the same problem. These only seem to happen on the Blade servers. Once or twice a month right now. Maybe there are too fast... :stuck_out_tongue:

Yes it might be related to the scheduler. My guess would be that some content gets lost during context switching. Anyhow HP would be able to provide an explanation and have a fix for this.