Help fixing awk code

SkySmart · January 18, 2017, 11:44am

can someone please help me spot and fix the issue with the following code:

awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND"
"BEGIN{           {D[10] = D[13] = 1
                 D[11] = D[14] = 2
                }

$0 ~ "," VF "," {L = 1                                  # start output only if VALFOUND is matched
                }

!L              {next                                   # skip line if NOT VALFOUND
                }

                {A = strftime("%a %b %d %T %Y,%s",$3)
                }

A ~ SEARCHPATT &&
NF in D         {TOT = 0
                 for (n = split ($(NF-D[NF]), T, "="); n>1; n--)    {sub (/_.*/, _, T[n]); TOT += T[n]}

                 if (NF == 11 || A ~ ADDISTR)   print TOT "-" $3 "_" $(NF-D[NF]+1) "----" A
                }
}" datafile.txt

it's supposed to be a optimized, better version of the following code which i slapped together:

gawk -v SEARCHPATT="${SEARCHPATT}" -v ADDISTR="${INCEXCSTR}" -F, '/,'"${VALFOUND}"',/,0 {A=strftime("%a %b %d %T %Y,%s",$3);if((NF == 13) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $12"-"$3"_0""-" $13"----"A} else if ((NF == 14) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $12"-"$3"_0""-" $13"----"A} else if ((NF == 10) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $9"-"$3"_"$10"----"A} else if ((NF == 11) && (A ~ SEARCHPATT)) {print $9"-"$3"_"$10"----"A} }' datafile.txt | awk -F"----" '{print $1}'

the content of the datafile being read here could look like this:

typeA

0,greenscreen_pc10,1484711626,335086,/PROD/NOA/cicsmrch/sys/unikixmain.log,25M,greenscreen_pc10,25638056,0,333183--335086,-1
0,greenscreen_pc10,1484711922,337099,/PROD/NOA/cicsmrch/sys/unikixmain.log,25M,greenscreen_pc10,25796338,0,335086--337099,0
0,greenscreen_pc10,1484712222,338253,/PROD/NOA/cicsmrch/sys/unikixmain.log,25M,greenscreen_pc10,25887414,0,337099--338253,2

or like this:

typeB

0,plm_tomcat_logcheck,1484756597,12685,/opt/apps/plm/logs/catalina.out,964K,plm_tomcat_logcheck,985770,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,11713--12685,2
0,plm_tomcat_logcheck,1484756898,12865,/opt/apps/plm/logs/catalina.out,980K,plm_tomcat_logcheck,999773,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,12685--12865,8
0,plm_tomcat_logcheck,1484757197,13076,/opt/apps/plm/logs/catalina.out,996K,plm_tomcat_logcheck,1017418,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,12865--13076,0

or like this:

typeC

0,plm_tomcat_logcheck,1424392034,81033,/opt/apps/plm/logs/catalina.out,6.3M,plm_tomcat_logcheck,6539198,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0,80801--81033
0,plm_tomcat_logcheck,1424392334,81307,/opt/apps/plm/logs/catalina.out,6.3M,plm_tomcat_logcheck,6561051,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0,81033--81307
0,plm_tomcat_logcheck,1424392634,81367,/opt/apps/plm/logs/catalina.out,6.3M,plm_tomcat_logcheck,6565967,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0,81307--81367

in the case of typeB and typeC, notice field 9 does not just contain a number. it contains a string and a 'equal-to' number. what i want to do is account for scenarios of typeA, typeB and typeC. meaning, add up the numbers in field 9, so that the resulting output looks similar to this:

0-1424534260_8--8
0-1424534560_8--8
0-1424534860_8--8
0-1413868231_12043--12043

RudiC · January 18, 2017, 12:33pm

The output you present doesn't match any of the input files or types. Please specify exactly how to tell the type[ABC] from each other.

SkySmart · January 18, 2017, 1:50pm

the text of type[ABC] can be in the same datafile.

so when running the awk code on the data file, here are the different ways to differentiate a line that is being read:

1). 

How many fields does the line have, if there are 11 fields on a line, do xyz, if there are 10 fields on a line, do abc, if there are 13 fields on a line do cbd.  
if there are 14 fields on a line, do this and that.  the lines are all comma delimited. 

2). 

In the 9th field of each line, is there just a numerical number in it, or is there a few texts.  
If there are texts (or alphanumeric characters) in the 9th field, get the number directly in front of the equal sign for each pattern.  
And add up all the number so u get a total number.  if there is just a numerical number in the 9th field, no need to do any adding.

when the above is done, grab the epoch time from the 3rd field, and also the second to last field. so after each line is processed, the result should be:

${totalcountFrom9thfield}-${thirdfield}_${the-secondto-last-field-HOWEVER-if-the-second to last field does not contain the string "--", then grab the last field instead because that will have the "--"}

RudiC · January 18, 2017, 2:18pm

Now THIS is precise a specification! Do you really want generic pseudo code as a solution proposal?

If you look into the code in post#1, it already reacts to the field count, and it calculates the numbers from $9, albeit with an algorithm different from what you describe. So - WHAT is the problem? Describe exactly and in detail, supply sample input line, its output and where it doesn't fulfill your needs.

BTW, how about trying to understand how the script works, and applying some corrections yourself?

SkySmart · January 18, 2017, 2:36pm

I added the text from typeA B and C to one datafile.

but when i run the code from post one on the combined datafile, i get the following:

 ./check
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:          GNU long options: (standard)
        -f progfile             --file=progfile
        -F fs                   --field-separator=fs
        -v var=val              --assign=var=val
Short options:          GNU long options: (extensions)
        -b                      --characters-as-bytes
        -c                      --traditional
        -C                      --copyright
        -d[file]                --dump-variables[=file]
        -D[file]                --debug[=file]
        -e 'program-text'       --source='program-text'
        -E file                 --exec=file
        -g                      --gen-pot
        -h                      --help
        -i includefile          --include=includefile
        -l library              --load=library
        -L [fatal]              --lint[=fatal]
        -n                      --non-decimal-data
        -M                      --bignum
        -N                      --use-lc-numeric
        -o[file]                --pretty-print[=file]
        -O                      --optimize
        -p[file]                --profile[=file]
        -P                      --posix
        -r                      --re-interval
        -S                      --sandbox
        -t                      --lint-old
        -V                      --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
        gawk '{ sum += $1 }; END { print sum }' file
        gawk -F: '{ print $1 }' /etc/passwd
./check: 19: ./check: NF-D[NF]: not found
./check: 21: ./check: NF-D[NF]+1: not found
./check: 14: ./check: BEGIN{           {D[10] = D[13] = 1
                 D[11] = D[14] = 2
                }

./check ~ , VF , {L = 1                                  # start output only if VALFOUND is matched
                }

!L              {next                                   # skip line if NOT VALFOUND
                }

                {A = strftime(%a: not found

i have tried playing with this many different ways. i suspect the issue may be related to the quotes but ive tried escaping them, adding them, removing them but the issue remains unresolved.

sorry if im making this complicated.

Corona688 · January 18, 2017, 2:38pm

Show your code, and then maybe we can tell you why it's not working.

Don't, and we'll never know.

vgersh99 · January 18, 2017, 2:55pm

these are problems:

"BEGIN{
....
}"

use single quotes (for starters):

awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND" '
BEGIN {
....

}' datafile.txt

RudiC · January 18, 2017, 3:35pm

You copied the errors from your previous thread - there already highlighted and correction proposed - into this one. Use vgersh99's proposal.

SkySmart · January 18, 2017, 3:53pm

so i just remodified the code using the suggestions, and below is what i have, which seems to be a huge improvement:

VALFOUND=1484711626
awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND" '
BEGIN {           
        {
                D[10] = D[13] = 1
                D[11] = D[14] = 2
        }
$0 ~ "," VF "," 
        {
                L = 1                                  # start output only if VALFOUND is matched
        }

!L
        {
                next                                   # skip line if NOT VALFOUND
        }
        {
                A = strftime("%a %b %d %T %Y,%s",$3)
        }
A ~ SEARCHPATT && NF in D 
        {
                TOT = 0
                for (n = split ($(NF-D[NF]), T, "="); n>1; n--)    {sub (/_.*/, _, T[n]); TOT += T[n]}
                 if (NF == 11 || A ~ ADDISTR) print TOT "-" $3 "_" $(NF-D[NF]+1) "----" A
        }
}' datafile.txt

which results in:

awk: cmd. line:14: error: `next' used in BEGIN action

what is wrong with the above next?

vgersh99 · January 18, 2017, 4:06pm

you have misbalanced curlies...

BEGIN {           
      {

wait.... why do you have EVERYTHING in the BEGIN block?