finding max size

Hi

I have a list of 2000 records with multiple entries and I want to get the max size for each entry

 ABC   1
            ABC   2
            ABC   3
            ABC   4
            DEF   1
            DEF   2
            DEF   2
            DEF   2
            DEF   2
            DEF   3
            DEF   4
            XYZ   1
            XYZ   2
            XYZ   3
            XYZ   3
            XYZ   3
            XYZ   4
            XYZ   4
            XYZ   4
            XYZ   5

so on..

I have presented here 3 different cases.

In the first case ABC all entries are only once.. So the max size for this 1

In second case DEF has "2" occurring four times so the max size for this 4

In third case XYZ both "3" and "4" are occurring three times so the max size is 3

output:

  ABC   1
             DEF   4
             XYZ   3

Thanks,

awk '{ if(A[$1] < $2) A[$1]=$2; }
END { for(k in A) { print k, A[k]; }' < file

Hi,

Thanks for the reply..

The code does not work as per my requirement. its outputting the last number in the entry ( example

  ABC 1
                                        ABC 2
                                        ABC 3
                                        ABC 4
                                        ABC 4
                                        ABC 5

Instead of outputting as 2 its outputting 5

What I need is the highest number of times a number is repeating for a particular entry. In the above example of all "4" is repeating two times. So the output should be "2".

Thanks,

Whatever you were running, it wasn't what I posted: It had a syntax error and didn't run at all :wall:

[edit] Ah, I see... Hmm... Working on it.

Hi,

I have resolved the erorr in the code and then used it.. Only after that I got the error.

Thanks,

Diya

That's what I get for answering too fast... Here's a solution that does what you want:

$ awk '{       A[ $1 "#" $2 ]++;       }
END {   for(K in A)
        {
                split(K, L, "#");
                STR=L[1]        ;       VAL=L[2]

                if(C[STR] <= A[K])
                {
                        C[STR]=A[K];
                        T[STR]=VAL
                }
        }

        for(K in T)     print K, T[K];
} < data
ABC 4
XYZ 3
DEF 2
$

There's an inconsistency in your example though. If we get a pattern like

A 1
A 1
A 2
A 2

which should be chosen, 1 or 2? Your example has ABC choosing the first max and DEF choosing the last max...

To choose the first instead of the last, change

if(C[STR] <= A[K])

to

if(C[STR] < A[K])

Sorry if I was unclear about my question.

The code is outputting the number which is repeating maximum times. What I want is to output the max times its repeating

for instance:

ABC   1
ABC   2
ABC   3
ABC   4
ABC   5
ABC   5
ABC   5
ABC   6
ABC   6
ABC   7
ABC   7
ABC   7
ABC   7
ABC   7
ABC   8
ABC   8
ABC   9
ABC   10

In this example 7 is rpeating the maximum number of times. Its repeating five times so the output should be 5.. The code what you sent earlier outputs "7" instead of "5".

The other example which you mentioned

A 1
A 1
A 2
A 2

In this scenario 2 is the maximum times a number(either 1 or 2) is repeating. So the output is 2

Thanks,

Okay.

awk '{       A[ $1 "#" $2 ]++;       }
END {   for(K in A)
        {
                split(K, L, "#");
                STR=L[1]        ;       VAL=L[2]

                if(C[STR] <= A[K])      C[STR]=A[K];
        }

        for(K in C)     print K, C[K];
}' < data

Thank you.. It worked :slight_smile:

mixing of command :frowning:

 
$ awk '{print $1}' inputfile | sort -u | while read word; do sort inputfile| uniq -c | sort -r -n -k1 -k2 | grep $word | head -1; done
   1 ABC   4
   4 DEF   2
   3 XYZ   4