Get min and max value in column

Gents,

I have a big file file like this.

5100010002
5100010004
5100010006
5100010008
5100010010
5100010012
5102010002
5102010004
5102010006
5102010008
5102010010
5102010012

The file is sorted and I would like to find the min and max value, taking in the consideration key1 (substr$0,1,5) and key2 substr($0,6,5)

I would like to get something like this

51000 10002 10012
51020 10002 10012

Thanks for your help

Hello jiam912,

Could you please try following and let me know if this helps.

wk '{A[substr($0,1,5)+0]=A[substr($0,1,5)+0]>=substr($0,6)+0?A[substr($0,1,5)+0]:substr($0,6)+0;B[substr($0,1,5)+0]=B[substr($0,1,5)+0]<=substr($0,6)+0 && B[substr($0,1,5)+0]?B[substr($0,1,5)+0]:substr($0,6)} END{print "MAX values:";for(i in A){print i OFS A};print "MIN values:";for(j in B){print j OFS B[j]}}' Input_file

Output is as follows.

MAX values:
51000 10012
51020 10012
MIN values:
51000 10002
51020 10002

EDIT: You can try following also to get output in requested form.

awk '{A[substr($0,1,5)+0]=A[substr($0,1,5)+0]>=substr($0,6)+0?A[substr($0,1,5)+0]:substr($0,6)+0;B[substr($0,1,5)+0]=B[substr($0,1,5)+0]<=substr($0,6)+0 && B[substr($0,1,5)+0]?B[substr($0,1,5)+0]:substr($0,6)} END{print "VALUE MAX MIN";for(i in A){print i OFS A OFS B}}' OFS="\t"  Input_file

EDIT: Adding a nonone liner form of solution.

awk '{
          A[substr($0,1,5)+0]=A[substr($0,1,5)+0]>=substr($0,6)+0?A[substr($0,1,5)+0]:substr($0,6)+0;
          B[substr($0,1,5)+0]=B[substr($0,1,5)+0]<=substr($0,6)+0 && B[substr($0,1,5)+0]?B[substr($0,1,5)+0]:substr($0,6)
       }
END {
          print "VALUE MAX MIN";
          for(i in A)
       {
          print i OFS A OFS B
       }
       }
    ' OFS="\t"  Input_file

Output will be as follows.

VALUE MAX MIN
51000 10012 10002
51020 10012 10002

Thanks,
R. Singh

1 Like

Any attempts from your side?

---------- Post updated at 14:18 ---------- Previous update was at 14:15 ----------

However, try

awk     '               {D1=substr($0, 1, 5)
                         D2=substr($0, 6, 5)+0
                        }
         !(D1 in MIN)   {MIN[D1]=D2
                         MAX[D1]=D2
                         next
                        }
          D2 < MIN[D1]  {MIN[D1]=D2}
          D2 > MAX[D1]  {MAX[D1]=D2}
          END           {for (m in MIN) print m, MIN[m], MAX[m]}
        ' file
51000 10002 10012
51020 10002 10012
1 Like

Dear All.

Both codes works fine.. Thanks for your help