finding max size

Diya123 · September 14, 2011, 3:11pm

Hi

I have a list of 2000 records with multiple entries and I want to get the max size for each entry

so on..

I have presented here 3 different cases.

In the first case ABC all entries are only once.. So the max size for this 1

In second case DEF has "2" occurring four times so the max size for this 4

In third case XYZ both "3" and "4" are occurring three times so the max size is 3

output:

  ABC   1
             DEF   4
             XYZ   3

Thanks,

Corona688 · September 14, 2011, 3:31pm

awk '{ if(A[$1] < $2) A[$1]=$2; }
END { for(k in A) { print k, A[k]; }' < file

Diya123 · September 14, 2011, 5:27pm

Hi,

Thanks for the reply..

The code does not work as per my requirement. its outputting the last number in the entry ( example

Instead of outputting as 2 its outputting 5

What I need is the highest number of times a number is repeating for a particular entry. In the above example of all "4" is repeating two times. So the output should be "2".

Thanks,

Corona688 · September 14, 2011, 5:30pm

Whatever you were running, it wasn't what I posted: It had a syntax error and didn't run at all :wall:

[edit] Ah, I see... Hmm... Working on it.

Diya123 · September 14, 2011, 5:32pm

Hi,

I have resolved the erorr in the code and then used it.. Only after that I got the error.

Thanks,

Diya

Corona688 · September 14, 2011, 5:48pm

That's what I get for answering too fast... Here's a solution that does what you want:

$ awk '{       A[ $1 "#" $2 ]++;       }
END {   for(K in A)
        {
                split(K, L, "#");
                STR=L[1]        ;       VAL=L[2]

                if(C[STR] <= A[K])
                {
                        C[STR]=A[K];
                        T[STR]=VAL
                }
        }

        for(K in T)     print K, T[K];
} < data
ABC 4
XYZ 3
DEF 2
$

There's an inconsistency in your example though. If we get a pattern like

A 1
A 1
A 2
A 2

which should be chosen, 1 or 2? Your example has ABC choosing the first max and DEF choosing the last max...

To choose the first instead of the last, change

if(C[STR] <= A[K])

to

if(C[STR] < A[K])

Diya123 · September 14, 2011, 6:03pm

Sorry if I was unclear about my question.

The code is outputting the number which is repeating maximum times. What I want is to output the max times its repeating

for instance:

ABC   1
ABC   2
ABC   3
ABC   4
ABC   5
ABC   5
ABC   5
ABC   6
ABC   6
ABC   7
ABC   7
ABC   7
ABC   7
ABC   7
ABC   8
ABC   8
ABC   9
ABC   10

In this example 7 is rpeating the maximum number of times. Its repeating five times so the output should be 5.. The code what you sent earlier outputs "7" instead of "5".

The other example which you mentioned

A 1
A 1
A 2
A 2

In this scenario 2 is the maximum times a number(either 1 or 2) is repeating. So the output is 2

Thanks,

Corona688 · September 14, 2011, 6:07pm

Okay.

awk '{       A[ $1 "#" $2 ]++;       }
END {   for(K in A)
        {
                split(K, L, "#");
                STR=L[1]        ;       VAL=L[2]

                if(C[STR] <= A[K])      C[STR]=A[K];
        }

        for(K in C)     print K, C[K];
}' < data

Diya123 · September 14, 2011, 9:54pm

Thank you.. It worked

itkamaraj · September 14, 2011, 10:25pm

mixing of command

 
$ awk '{print $1}' inputfile | sort -u | while read word; do sort inputfile| uniq -c | sort -r -n -k1 -k2 | grep $word | head -1; done
   1 ABC   4
   4 DEF   2
   3 XYZ   4