I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this.
Hi,
This one should also be ok for you. Actually, this case involved persormance issue, since your file has thousound and hunderds of lines. So difficult logic will have different result.
To be honest, i only know how to get the result, but i have no idea to give out a high-performance code. So you'd better ask some expert for help.
Here comes my code:
awk 'BEGIN{
FS=","
n=0
}
{
sum[$3]++
if (sum[$3]>n)
{
n=sum[$3]
m=$3
}
}
END{
print m
}' filename
I got both of the above to work but my CPU usage hit 100% lol! Any ideas on either making this more efficient or limiting the amount of CPU that this awk script can hog?
#!/usr/bin/env sh
# @(#) s1 Demonstrate determination of maximum string occurrence.
set -o nounset
echo
debug=":"
debug="echo"
## Use local command version for the commands in this demonstration.
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version bash cut sort uniq sed
echo
FILE=${1-data1}
echo
echo " Input file:"
cat data1
echo
echo " Results from pipeline ( extract, sort, count, isolate ):"
cut -d, -f3 $FILE |
sort |
uniq -c |
sort -nr |
sed -n -e '1s/^ *[0-9][0-9]* *//p;q'
exit 0
Producing:
% ./s1
(Versions displayed with local utility "version")
GNU bash 2.05b.0
cut (coreutils) 5.2.1
sort (coreutils) 5.2.1
uniq (coreutils) 5.2.1
GNU sed version 4.1.2
Input file:
value1,value2,bob
value1,value2,bob
value1,value2,bob
value1,value2,dave
value1,value2,james
Results from pipeline ( extract, sort, count, isolate ):
bob