Awking custom output

SkySmart · January 8, 2017, 12:16am

i have data that can look like this:

echo "Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=1_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=7_nThe_file_or_directory_is_corrupted_and_unreadable=0_n"

or

echo "Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=1 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=7 The_file_or_directory_is_corrupted_and_unreadable=0"

or

_error_=0-- _fatal_=0-- _panic_=0-- _fault_=0

I need to grab the number that comes right after the equal sign "=" for each of the patterns and after getting all the numbers, I want add them up to get the total.

so in the above scenario, there is a total of 8 errors

im looking for a solution that will take into account scenarios of both output. im hoping awk can be used for this.

Scrutinizer · January 8, 2017, 2:26am

Try:

awk -F= '{t=0; for(i=2; i<=NF; i++) t+=$i; print t}'

SkySmart · January 8, 2017, 1:06pm

thank you for your code Scrutinizer. it works perfectly!

one more request.

the actual datafile looks like this:

0,tomcat_logcheck,1483897732,240,/opt/apps/plmviewcafe/logs/catalina.out,16K,tomcat_logcheck,14560,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,174--240,181
0,tomcat_logcheck,1483898023,309,/opt/apps/plmviewcafe/logs/catalina.out,20K,tomcat_logcheck,19277,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,240--309,25

and i run the following awk command on the data:

gawk -v SEARCHPATT="${SEARCHPATT}" -v ADDISTR="${INCEXCSTR}" -F, '/,'"${VALFOUND}"',/,0 {A=strftime("%a %b %d %T %Y,%s",$3);if((NF == 13) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $12"-"$3"_0""-" $13"----"A} else if ((NF == 14) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $12"-"$3"_0""-" $13"----"A} else if ((NF == 10) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $9"-"$3"_"$10"----"A} else if ((NF == 11) && (A ~ SEARCHPATT)) {print $9"-"$3"_"$10"----"A} }' datafile.txt | awk -F"----" '{print $1}'

which normally produces an expected output similar to this:

0-1424534260_8--8
0-1424534560_8--8
0-1424534860_8--8
0-1424535160_8--8
0-1424535460_8--8
0-1424535760_8--8
0-1424536060_8--8
0-1424536360_8--8

i get the above output only if the 9th field contains a value. but if the 9th field contains the original values I posted in this thread, "Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=1 emxCommonAppInitialization__Error_while_initializing=0 ", i get output similar to the following, which is not what i want.:

emxCommonAppInitialization__Error_while_initializing=0-- INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0-- java_lang_NoClassDefFoundError=0-- java_lang_OutOfMemoryError=0-- Master_Item_Service_is_down=0-- The_file_or_directory_is_corrupted_and_unreadable=0-1413868231_12043--12043

so i would like to incorporate your command into my original command that i pasted in this post, so that it adds up all the values in the 9th frield and then shows the expected output:

0-1413868231_12043--12043
.....

with the bolded being the total of all the values in that 9th column. sorry if i just made this too complicated.

RudiC · January 9, 2017, 6:23am

Your expected output doesn't really fit what your incredibly overcomplex script will produce when applied to your input sample. Which doesn't necessarily help us help you. Nor does the missing context like the undefined shell variables.

You seem to want to print $(NF-D)"-"$3"_"$(NF-E)"----"A with D equals 1 or 2 and E equals 0 or 1 depending on the field count of the line. How about considering boiling down above to something like

$0 ~ "," VF "," {L = 1
                }

!L              {next
                }

A ~ SEARCHPATT  {D == 0
                 if (NF == 10 || NF == 13)      {D = 1 
                                                 E = 0
                                                }
                 if (NF == 11 || NF == 14)      {D = 2 
                                                 E = 1
                                                }
                 F = (NF == 11 || A ~ ADDISTR)

                 if (D && F)                    print $(NF-D)"-"$3"_"$(NF-E)"----"A
                }

Why do you print the A variable if immediately afterwards remove it again?

For your $9 problem, you might want to try

{TOT = 0; for (n = split ($(NF-D), T, "="); n>1; n--) {sub (/_.*/, _, T[n]); TOT += T[n]};

and print TOT in lieu of the resp. field.

SkySmart · January 11, 2017, 2:29pm

rudic:

Your expected output doesn't really fit what your incredibly overcomplex script will produce when applied to your input sample. Which doesn't necessarily help us help you. Nor does the missing context like the undefined shell variables.

You seem to want to print $(NF-D)"-"$3"_"$(NF-E)"----"A with D equals 1 or 2 and E equals 0 or 1 depending on the field count of the line. How about considering boiling down above to something like
$0 ~ "," VF "," {L = 1
   }

!L              {next
   }

A ~ SEARCHPATT  {D == 0
   if (NF == 10 || NF == 13)      {D = 1 
   E = 0
   }
   if (NF == 11 || NF == 14)      {D = 2 
   E = 1
   }
   F = (NF == 11 || A ~ ADDISTR)

   if (D && F)                    print $(NF-D)"-"$3"_"$(NF-E)"----"A
   }
Why do you print the A variable if immediately afterwards remove it again?

For your $9 problem, you might want to try
{TOT = 0; for (n = split ($(NF-D), T, "="); n>1; n--) {sub (/_.*/, _, T[n]); TOT += T[n]};
and print TOT in lieu of the resp. field.

im trying to boil down the command as you suggested, but im running into issues:

gawk -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -F, '/,1484023642,/,0 {A=strftime("%a %b %d %T %Y,%s",$3); {$0 ~ "," VF "," {L = 1
                }

!L              {next
                }

A ~ SEARCHPATT  {D == 0
                 if (NF == 10 || NF == 13)      {D = 1 
                                                 E = 0
                                                }
                 if (NF == 11 || NF == 14)      {D = 2 
                                                 E = 1
                                                }
                 F = (NF == 11 || A ~ ADDISTR)
                 if (D && F)                    print $(NF-D)"-"$3"_"$(NF-E)"----"A
}
}' datafile.txt

errors I get:

gawk: /,1484023642,/,0 {A=strftime("%a %b %d %T %Y,%s",$3); {$0 ~ "," VF "," {L = 1
gawk:                                                                        ^ syntax error
gawk: cmd. line:3: !L              {next
gawk: cmd. line:3:                 ^ syntax error
gawk: cmd. line:6: A ~ SEARCHPATT  {D == 0
gawk: cmd. line:6:                 ^ syntax error
gawk: cmd. line:17: }
gawk: cmd. line:17:  ^ unexpected newline or end of string
[mojomo@pgphxplmap004 ~]$

RudiC · January 11, 2017, 2:42pm

Of course. You messed up the pattern {action} pair structure of awk . Try (untested)

awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND"
BEGIN           {D[10] = D[13] = 1
                 D[11] = D[14] = 2
                }

$0 ~ "," VF "," {L = 1                                  # start output only if VALFOUND is matched
                }

!L              {next                                   # skip line if NOT VALFOUND
                }

                {A = strftime("%a %b %d %T %Y,%s",$3)
                }

A ~ SEARCHPATT &&
NF in D         {TOT = 0
                 for (n = split ($(NF-D[NF]), T, "="); n>1; n--)    {sub (/_.*/, _, T[n]); TOT += T[n]}

                 if (NF == 11 || A ~ ADDISTR)   print TOT "-" $3 "_" $(NF-D[NF]+1) "----" A
                }
}' datafile.txt

EDIT: Actually, looking at it again, there's another logic flaw: if A ~ "(Wed|Tue)" it will ALWAYS match Mon|Tue|Wed|Thu|Fri|Sat|Sun , so the if in front of the print is pointless.

SkySmart · January 11, 2017, 5:45pm

rudic:

Of course. You messed up the pattern {action} pair structure of awk . Try (untested)

awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND"
BEGIN     '{      {D[10] = D[13] = 1
   D[11] = D[14] = 2
   }

$0 ~ "," VF "," {L = 1                                  # start output only if VALFOUND is matched
   }

!L              {next                                   # skip line if NOT VALFOUND
   }

   {A = strftime("%a %b %d %T %Y,%s",$3)
   }

A ~ SEARCHPATT &&
NF in D         {TOT = 0
   for (n = split ($(NF-D[NF]), T, "="); n>1; n--)    {sub (/_.*/, _, T[n]); TOT += T[n]}

   if (NF == 11 || A ~ ADDISTR)   print TOT "-" $3 "_" $(NF-D[NF]+1) "----" A
   }
}' datafile.txt

EDIT: Actually, looking at it again, there's another logic flaw: if A ~ "(Wed|Tue)" it will ALWAYS match Mon|Tue|Wed|Thu|Fri|Sat|Sun , so the if in front of the print is pointless.

im probably missing something here because when i run this script, it still bails out:

# cat ./hen:   
#!/bin/sh

awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND"
BEGIN   '{           {D[10] = D[13] = 1
                 D[11] = D[14] = 2
                }

$0 ~ "," VF "," {L = 1                                  # start output only if VALFOUND is matched
                }

!L              {next                                   # skip line if NOT VALFOUND
                }

                {A = strftime("%a %b %d %T %Y,%s",$3)
                }

A ~ SEARCHPATT &&
NF in D         {TOT = 0
                 for (n = split ($(NF-D[NF]), T, "="); n>1; n--)    {sub (/_.*/, _, T[n]); TOT += T[n]}

                 if (NF == 11 || A ~ ADDISTR)   print TOT "-" $3 "_" $(NF-D[NF]+1) "----" A
                }
}' tomcatlogcheck

# ./hen 
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:          GNU long options:
        -f progfile             --file=progfile
        -F fs                   --field-separator=fs
        -v var=val              --assign=var=val
        -m[fr] val
        -O                      --optimize
        -W compat               --compat
        -W copyleft             --copyleft
        -W copyright            --copyright
        -W dump-variables[=file]        --dump-variables[=file]
        -W exec=file            --exec=file
        -W gen-po               --gen-po
        -W help                 --help
        -W lint[=fatal]         --lint[=fatal]
        -W lint-old             --lint-old
        -W non-decimal-data     --non-decimal-data
        -W profile[=file]       --profile[=file]
        -W posix                --posix
        -W re-interval          --re-interval
        -W source=program-text  --source=program-text
        -W traditional          --traditional
        -W usage                --usage
        -W use-lc-numeric       --use-lc-numeric
        -W version              --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
        gawk '{ sum += $1 }; END { print sum }' file
        gawk -F: '{ print $1 }' /etc/passwd
./hen: line 4: BEGIN: command not found
./hen: line 5: D[11]: command not found
./hen: line 6: syntax error near unexpected token `}'
./hen: line 6: `                }'

RudiC · January 12, 2017, 8:56am

The single quote at the end of the first line was lost in transfer, sorry for that. Add one and remove the '{ in red. As said,this was untested.