Awk; gsub in fields 3 and 4

Bubnoff · September 29, 2010, 10:45pm

I want to transform a log file into input for a database.

Here's the log file:

Tue Aug 4 20:17:01 PDT 2009
Wireless users: 339
Daily Average: 48.4285
=
Tue Aug 11 20:17:01 PDT 2009
Wireless users: 295
Daily Average: 42.1428
=
Tue Aug 18 20:17:01 PDT 2009
Wireless users: 294
Daily Average: 42.0000
=
Tue Aug 25 20:17:01 PDT 2009
Wireless users: 289
Daily Average: 41.2857
=

I need to strip the descriptions for "Wireless users" and "Daily Average" but keep the date as is.

So far, I thought I could use "=" for the record separator and "\n" as the field separator. Here's what I've got so far:

awk -F'\n' 'RS="\="{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}}; 1' rotate1.log

The output confuses me:

Tue Aug 4 20:17:01 PDT 2009
Wireless users: 339
Daily Average: 48.4285

 Tue Aug 11 20:17:01 PDT 2009 Wireless users: 295 42.1428 
 Tue Aug 18 20:17:01 PDT 2009 Wireless users: 294 42.0000 
 Tue Aug 25 20:17:01 PDT 2009 Wireless users: 289 41.2857 
 Tue Sep 1 20:17:01 PDT 2009 Wireless users: 379 54.1428

Why is it printing the first record as is, printing the rest as specified by RS and FS?

Secondly I need to gsub on field 3 as well. Here's one with two
gsub statements:

awk -F'\n' 'RS="\="{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}{gsub(/[^[:digit:]]/,"",$3)}}; 1' rotate1.log

Output ( still printing first record unscathed ):

Tue Aug 4 20:17:01 PDT 2009
Wireless users: 339
Daily Average: 48.4285

 Tue Aug 11 20:17:01 PDT 2009 295 42.1428 
 Tue Aug 18 20:17:01 PDT 2009 294 42.0000 
 Tue Aug 25 20:17:01 PDT 2009 289 41.2857 
 Tue Sep 1 20:17:01 PDT 2009 379 54.1428

Is there a way to throw an "or" in there to reduce the gsubs to
one?

So the output above is OK except for the printing of the first record "AS IS".

With OFS set as tab I've nearly got what I need:

awk -F'\n' 'RS="\="{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}{gsub(/[^[:digit:]]/,"",$3)}};{OFS="\t"};1' rotate1.log

So what's up with printing the first record undigested?

Thanks for reading!

Bubnoff

radoulov · September 30, 2010, 4:48am

Because you put the RS assignment in the wrong place.
Put the RS assignment outside of the code:

awk -v RS='\n' ...

Please post an example of the desired output based on the previously posted input.

Bubnoff · September 30, 2010, 12:37pm

Thanks Radoulov ~

However, it's still printing the first record incorrectly. Here's what I need:

Tue Aug 11 20:17:01 PDT 2009    295     42.1428 
Tue Aug 18 20:17:01 PDT 2009    294     42.0000 
Tue Aug 25 20:17:01 PDT 2009    289     41.2857 
Tue Sep 1 20:17:01 PDT 2009     379     54.1428 
Tue Sep 8 20:17:01 PDT 2009     287     41.0000

Here's the current Awk ( with your suggestion ):

 awk -F'\n' -v RS='=' '{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}{gsub(/[^[:digit:]]/,"",$3)}};{OFS="\t"};1' rotate1.log

Here's the resulting output:

Tue Aug 4 20:17:01 PDT 2009     Wireless users: 339     484285  
        Tue Aug 11 20:17:01 PDT 2009    295     42.1428 
        Tue Aug 18 20:17:01 PDT 2009    294     42.0000 
        Tue Aug 25 20:17:01 PDT 2009    289     41.2857 
        Tue Sep 1 20:17:01 PDT 2009     379     54.1428 
        Tue Sep 8 20:17:01 PDT 2009     287     41.0000

I'm very very close here, but that first record is still not being transformed. I also notice that there are blank new lines at the bottom of the output that are not present in the input.

Thanks again ~

Bubnoff

---------- Post updated at 09:30 AM ---------- Previous update was at 09:07 AM ----------

I notice that I really don't need the "for loop".

awk -F'\n' -v RS='=' '{gsub(/[^[:digit:]]/,"",$3)}{gsub(/[^[:digit:].]/,"",$4)};{OFS="\t";print}' rotate1.log

Still get the partially transformed first record:

Tue Aug 4 20:17:01 PDT 2009     Wireless users: 339     484285

Note that the gsub for field 4 is partially applied while not at all on 4. Also not respecting the period in the expression for 4 but stripping "non-digits".

Should look like this:

Tue Aug 4 20:17:01 PDT 2009     339     48.4285

---------- Post updated at 09:37 AM ---------- Previous update was at 09:30 AM ----------

In the input file there was not a RS ( ie., = ) before the first record.

So with this:

 awk -F'\n' -v RS='=' '{gsub(/[^[:digit:]]/,"",$3);gsub(/[^[:digit:].]/,"",$4)};{OFS="\t";print}' rotate1.log

I get the expected results. Is it possible to reduce this to one gsub for both the 3rd and 4th fields?

Also, do I need sed to remove the blank lines in the output, or is there something I'm missing in my Awk?

awk -F'\n' -v RS='=' '{gsub(/[^[:digit:]]/,"",$3);gsub(/[^[:digit:].]/,"",$4)};{OFS="\t";print}' rotate1.log | sed '/^[ \t]*$/d'

Thanks again for your suggestions!

Bubnoff

radoulov · September 30, 2010, 2:15pm

You can use something like this:

awk -F: '/=/ { print x }
NF > 2 { printf "%s", $0; next }
{ printf "%s", $2 }' infile

Given the fixed format, you could also write something like this:

awk 'NF {
  for (i = 0; ++i <= 6;)
    printf "%s ", $i
  printf "%s %s\n", $9, $12
  }' RS== infile

Bubnoff · September 30, 2010, 2:45pm

Nice solutions but how do they work?

awk -F: '/=/ { print x } NF > 2 { printf "%s", $0; next } { printf "%s", $2 }' infile

I get the NF > 2 { ...etc and after but why does the /=/ {print x}
work?

I think I understand the second one.

Thanks again!

Bub

radoulov · September 30, 2010, 4:55pm

Just to print a newline
You can use printf "\n" instead if you wish.

Bubnoff · September 30, 2010, 5:15pm

You rock!!

Three ways to do this ...one record based ( record as defined by input file ) and two essentially line/field based.

Thanks again!

Bub