Help fixing awk code

can someone please help me spot and fix the issue with the following code:

awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND"
"BEGIN{           {D[10] = D[13] = 1
                 D[11] = D[14] = 2
                }

$0 ~ "," VF "," {L = 1                                  # start output only if VALFOUND is matched
                }

!L              {next                                   # skip line if NOT VALFOUND
                }

                {A = strftime("%a %b %d %T %Y,%s",$3)
                }

A ~ SEARCHPATT &&
NF in D         {TOT = 0
                 for (n = split ($(NF-D[NF]), T, "="); n>1; n--)    {sub (/_.*/, _, T[n]); TOT += T[n]}

                 if (NF == 11 || A ~ ADDISTR)   print TOT "-" $3 "_" $(NF-D[NF]+1) "----" A
                }
}" datafile.txt

it's supposed to be a optimized, better version of the following code which i slapped together:

gawk -v SEARCHPATT="${SEARCHPATT}" -v ADDISTR="${INCEXCSTR}" -F, '/,'"${VALFOUND}"',/,0 {A=strftime("%a %b %d %T %Y,%s",$3);if((NF == 13) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $12"-"$3"_0""-" $13"----"A} else if ((NF == 14) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $12"-"$3"_0""-" $13"----"A} else if ((NF == 10) && (A ~ ADDISTR) && (A ~ SEARCHPATT)) {print $9"-"$3"_"$10"----"A} else if ((NF == 11) && (A ~ SEARCHPATT)) {print $9"-"$3"_"$10"----"A} }' datafile.txt | awk -F"----" '{print $1}'

the content of the datafile being read here could look like this:

typeA

0,greenscreen_pc10,1484711626,335086,/PROD/NOA/cicsmrch/sys/unikixmain.log,25M,greenscreen_pc10,25638056,0,333183--335086,-1
0,greenscreen_pc10,1484711922,337099,/PROD/NOA/cicsmrch/sys/unikixmain.log,25M,greenscreen_pc10,25796338,0,335086--337099,0
0,greenscreen_pc10,1484712222,338253,/PROD/NOA/cicsmrch/sys/unikixmain.log,25M,greenscreen_pc10,25887414,0,337099--338253,2

or like this:

typeB

0,plm_tomcat_logcheck,1484756597,12685,/opt/apps/plm/logs/catalina.out,964K,plm_tomcat_logcheck,985770,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,11713--12685,2
0,plm_tomcat_logcheck,1484756898,12865,/opt/apps/plm/logs/catalina.out,980K,plm_tomcat_logcheck,999773,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,12685--12865,8
0,plm_tomcat_logcheck,1484757197,13076,/opt/apps/plm/logs/catalina.out,996K,plm_tomcat_logcheck,1017418,Master_Item_Service_is_down=0_njava_lang_NoClassDefFoundError=0_njava_lang_OutOfMemoryError=0_nemxCommonAppInitialization__Error_while_initializing=0_nINFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0_nThe_file_or_directory_is_corrupted_and_unreadable=0_n,12865--13076,0

or like this:

typeC

0,plm_tomcat_logcheck,1424392034,81033,/opt/apps/plm/logs/catalina.out,6.3M,plm_tomcat_logcheck,6539198,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0,80801--81033
0,plm_tomcat_logcheck,1424392334,81307,/opt/apps/plm/logs/catalina.out,6.3M,plm_tomcat_logcheck,6561051,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0,81033--81307
0,plm_tomcat_logcheck,1424392634,81367,/opt/apps/plm/logs/catalina.out,6.3M,plm_tomcat_logcheck,6565967,Master_Item_Service_is_down=0 java_lang_NoClassDefFoundError=0 java_lang_OutOfMemoryError=0 emxCommonAppInitialization__Error_while_initializing=0 INFO__Stopping_Coyote_HTTP_1_1_on_http_8080=0 The_file_or_directory_is_corrupted_and_unreadable=0,81307--81367

in the case of typeB and typeC, notice field 9 does not just contain a number. it contains a string and a 'equal-to' number. what i want to do is account for scenarios of typeA, typeB and typeC. meaning, add up the numbers in field 9, so that the resulting output looks similar to this:

0-1424534260_8--8
0-1424534560_8--8
0-1424534860_8--8
0-1413868231_12043--12043

The output you present doesn't match any of the input files or types. Please specify exactly how to tell the type[ABC] from each other.

the text of type[ABC] can be in the same datafile.

so when running the awk code on the data file, here are the different ways to differentiate a line that is being read:

1). 

How many fields does the line have, if there are 11 fields on a line, do xyz, if there are 10 fields on a line, do abc, if there are 13 fields on a line do cbd.  
if there are 14 fields on a line, do this and that.  the lines are all comma delimited. 

2). 

In the 9th field of each line, is there just a numerical number in it, or is there a few texts.  
If there are texts (or alphanumeric characters) in the 9th field, get the number directly in front of the equal sign for each pattern.  
And add up all the number so u get a total number.  if there is just a numerical number in the 9th field, no need to do any adding.  

when the above is done, grab the epoch time from the 3rd field, and also the second to last field. so after each line is processed, the result should be:

${totalcountFrom9thfield}-${thirdfield}_${the-secondto-last-field-HOWEVER-if-the-second to last field does not contain the string "--", then grab the last field instead because that will have the "--"}

Now THIS is precise a specification! Do you really want generic pseudo code as a solution proposal?

If you look into the code in post#1, it already reacts to the field count, and it calculates the numbers from $9, albeit with an algorithm different from what you describe. So - WHAT is the problem? Describe exactly and in detail, supply sample input line, its output and where it doesn't fulfill your needs.

BTW, how about trying to understand how the script works, and applying some corrections yourself?

1 Like

I added the text from typeA B and C to one datafile.

but when i run the code from post one on the combined datafile, i get the following:

 ./check
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:          GNU long options: (standard)
        -f progfile             --file=progfile
        -F fs                   --field-separator=fs
        -v var=val              --assign=var=val
Short options:          GNU long options: (extensions)
        -b                      --characters-as-bytes
        -c                      --traditional
        -C                      --copyright
        -d[file]                --dump-variables[=file]
        -D[file]                --debug[=file]
        -e 'program-text'       --source='program-text'
        -E file                 --exec=file
        -g                      --gen-pot
        -h                      --help
        -i includefile          --include=includefile
        -l library              --load=library
        -L [fatal]              --lint[=fatal]
        -n                      --non-decimal-data
        -M                      --bignum
        -N                      --use-lc-numeric
        -o[file]                --pretty-print[=file]
        -O                      --optimize
        -p[file]                --profile[=file]
        -P                      --posix
        -r                      --re-interval
        -S                      --sandbox
        -t                      --lint-old
        -V                      --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
        gawk '{ sum += $1 }; END { print sum }' file
        gawk -F: '{ print $1 }' /etc/passwd
./check: 19: ./check: NF-D[NF]: not found
./check: 21: ./check: NF-D[NF]+1: not found
./check: 14: ./check: BEGIN{           {D[10] = D[13] = 1
                 D[11] = D[14] = 2
                }

./check ~ , VF , {L = 1                                  # start output only if VALFOUND is matched
                }

!L              {next                                   # skip line if NOT VALFOUND
                }

                {A = strftime(%a: not found

i have tried playing with this many different ways. i suspect the issue may be related to the quotes but ive tried escaping them, adding them, removing them but the issue remains unresolved.

sorry if im making this complicated.

Show your code, and then maybe we can tell you why it's not working.

Don't, and we'll never know.

these are problems:

"BEGIN{
....
}"

use single quotes (for starters):

awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND" '
BEGIN {
....

}' datafile.txt
1 Like

You copied the errors from your previous thread - there already highlighted and correction proposed - into this one. Use vgersh99's proposal.

1 Like

so i just remodified the code using the suggestions, and below is what i have, which seems to be a huge improvement:

VALFOUND=1484711626
awk -F, -v SEARCHPATT="(Wed|Tue)" -v ADDISTR="Mon|Tue|Wed|Thu|Fri|Sat|Sun" -vVF="$VALFOUND" '
BEGIN {           
        {
                D[10] = D[13] = 1
                D[11] = D[14] = 2
        }
$0 ~ "," VF "," 
        {
                L = 1                                  # start output only if VALFOUND is matched
        }

!L
        {
                next                                   # skip line if NOT VALFOUND
        }
        {
                A = strftime("%a %b %d %T %Y,%s",$3)
        }
A ~ SEARCHPATT && NF in D 
        {
                TOT = 0
                for (n = split ($(NF-D[NF]), T, "="); n>1; n--)    {sub (/_.*/, _, T[n]); TOT += T[n]}
                 if (NF == 11 || A ~ ADDISTR) print TOT "-" $3 "_" $(NF-D[NF]+1) "----" A
        }
}' datafile.txt

which results in:

awk: cmd. line:14: error: `next' used in BEGIN action

what is wrong with the above next?

you have misbalanced curlies...

BEGIN {           
      {

wait.... why do you have EVERYTHING in the BEGIN block?

1 Like