Getting the most repeated column

Hi all ,

i want to get the most repeated column in my file
File:

name,ID 
adam,12345  ----1
adam,12345  ----2
adam,934
adam,12345  ----3
john,14
john,13
john,25 ----1 
john,25 ----2
tom,1  -----1
tom,2  -----1

so my output to be

adam,12345,4    ----[4] mean adams appears 4 times
john,25,4
tom,1,2 ----as it appears first  or if possible get tom,1,2,2 --- (1) and (2) with 2 appearances 

thanks alot in advance

what have you tried?

i have tried with

cat file | uniq -c | /usr/xpg4/bin/awk -F"," '!a[$1,$2]++'

to get me first or most repeated but in case of tom i want the 2 cases from 2 diff records in one record

its not that simple

name,ID,ID1,ID2
adam,12345,1,2  ----1
adam,12345,1,1  ----2
adam,934,1,2
adam,12345,2,2  ----3
john,14
john,13
john,25 ----1 
john,25 ----2
tom,1  -----1
tom,2  -----1

to get me for example

adam,12345,1,2 the most repeated fields in one reocrds

I am not sure if I understood what you want, since the 2nd post for line "adam,12345" differs from the example in post 1. There are 2 new fields that weren't there in post 1.
Seems there is some inconsistency between examples of input and output.

Anyway, giving a blind shot taking the 1st example as input without the -----[n]:

$ awk 'NR > 1{_[$1]++} END{for(a in _){print a ","  _[a]}}' infile | sort -nt, -k3| tail -1
adam,12345,3

The cat in your code is not needed.

1 Like

i have tried it , it works but i think you can make it

NR>=1  instead NR>1

but thanks but can u look at the second example

Nope, that's not correct.
NR>=1 means equal or greater than 1. Since a file you want to parse usually has a 1st line, this makes no sense. You could leave it away if you want to count the header in.
If you want to skip the header line, you have to use NR>1 to just skip it.

So if the 2nd example is just a new or altered request, you should be able to alter the code given, to achieve the same. If you do not understand the code, that is no problem, but you have to let us know.
This is no script drive-in :wink: