awk - Print column number that return value comes from

ncwxpanther · February 23, 2016, 7:58am

I have the following awk script that I am using to find the max value in the file and print results.

awk 'BEGIN  {MAX=-1E100} {for (x=2; x<=NF; x++) if ($x>MAX) {MAX = $x;  C1 = $1}} END {print substr(C1,1,11), substr(C1,13,4), substr(C1,18,2), MAX}' ABC*

Input (ABC*)

ABC00053862Y2001M08    3    4    5    6    7    0    1    2    3    4    0   -9   -9   -9   -9   -9    1    2    3    4   -9   -9   -9   -9   -9   -9   -9   -9   -9   -9   -9

Output

ABC00053862 2001 08 7

I need to also return the column (-1) that the "7" (MAX) came from. In this case it would be Column 5 (Column 6-1). Columns 2-32 are days so that is why I need to subtract 1 from the actual column number.

Is there a way to incorporate that function in with the existing script?

Expected output

ABC00053862 2001 08 5 7

RavinderSingh13 · February 23, 2016, 8:45am

Hello ncwxpanther,

Could you please try following and let me know if this helps you.

awk '{split($1, array,"Y|M");A=array[1] OFS array[2] OFS array[3];for(i=2;i<=NF;i++){MAX=MAX>$i?MAX:$i};print A OFS MAX;MAX=""}'  Input_file

Output will be as follows.

ABC00053862 2001 08 7

NOTE: Assuming your all data(specially first field) is of same pattern like sample Input_file which you have shown to us into your post above.

Thanks,
R. Singh

ncwxpanther · February 23, 2016, 9:08am

The output is not what I expected.

Your Output

USW00096407 1998 04PRCPEQ00 7
USW00096407 1998 05PRCPEQ00 6
USW00096407 1998 06PRCPEQ00 9
USW00096407 1998 07PRCPEQ00 12
USW00096407 1998 08PRCPEQ00 3
USW00096407 1998 09PRCPEQ00 11

Expected Output

USW00096407 2015 09 8 11

Keep in mind that there are thousands of input files (ABC*). Inside each input file there are thousands of lines.

Example of input file names

USW00096407
USW00096406
USW00096404

Example of contents of one of the input files.

USW00096407Y2015M08PRCPEQ00   -9   -9   -9   -9   -9   -9   -9   -9   -9   -9   -9   -9   -9    1    2   0    0    0    0    0    0    1    2    0    0    0    1    0    1    2    3
USW00096407Y2015M09PRCPEQ00    4    5    6    7    8    9   10   11    0    0    0    0    0    1    2    0    1    2    0    1    2    3    4    0    0    0    0    1    2    0   -9
USW00096407Y2015M10PRCPEQ00    0    0    0    0    0    1    2    3    4    5    0    1    0    1    0    1    2    0    0    1    2    3    4    0    0   -9   -9   -9    1    2    0

RavinderSingh13 · February 23, 2016, 9:18am

Hello ncwxpanther,

Request you to please show us complete Input_file(sample file) as in your very first post you haven't showed us the complete data which you have done in 3rd post, following may helpful for you.

awk '{if($0 !~ /^$/){split($1, array,"Y|M|PRCPEQ");A=array[1] OFS array[2] OFS array[3] OFS array[3]-1;for(i=2;i<=NF;i++){MAX=MAX>$i?MAX:$i};print A OFS MAX;MAX=""}}'  Input_file

Here as you have mentioned there are so may files you could use following then.

for file in ABC*
do
     awk '{if($0 !~ /^$/){split($1, array,"Y|M|PRCPEQ");A=array[1] OFS array[2] OFS array[3] OFS array[3]-1;for(i=2;i<=NF;i++){MAX=MAX>$i?MAX:$i};print A OFS MAX;MAX=""}}'  $file
done

If you wanted to add file's name to output then you could use following too.

for file in ABC*
do
     awk 'BEGIN{print "File which is getting executed is: " FILENAME} {if($0 !~ /^$/){split($1, array,"Y|M|PRCPEQ");A=array[1] OFS array[2] OFS array[3] OFS array[3]-1;for(i=2;i<=NF;i++){MAX=MAX>$i?MAX:$i};print A OFS MAX;MAX=""}}'  $file
done

If you have any queries or above solutions doesn't meet your requirement then request you to please show us more sample inputs with more specifically sample outputs with complete terms and conditions etc in detailed manner.

Thanks,
R. Singh

ncwxpanther · February 23, 2016, 9:45am

My original code prints the max in each unique file.

USW00003013 2009 03 43
USW00003016 2003 07 21
USW00003017 2006 01 27
USW00003024 2000 09 42

Your code appears to print the max of each line within each file. It is also not printing the correct column numbers.

USW00003048 2015 12 11 25
USW00003048 2016 01 0 13
USW00003048 2016 02 1 19
File which is getting executed is:
USW00003054 2004 02 1 0
USW00003054 2004 03 2 10
USW00003054 2004 04 3 5

RudiC · February 23, 2016, 10:13am

How about

awk '
BEGIN   {MAX=-1E100
        }
        {for (x=2; x<=NF; x++) if ($x>MAX)      {MAX = $x
                                                 C1  = $1
                                                 COL = x
                                                }
        }
END     {print substr(C1,1,11), substr(C1,13,4), substr(C1,18,2), MAX, COL-1
        }
' file

ncwxpanther · February 23, 2016, 10:24am

Thanks Rudi. I was able to get that to work as expected.