"group by" using shell script?

tiger2000 · May 17, 2010, 6:34am

not sure if it's called "group by" , but what i'm going to do is like this:

i have a file below:

192.168.1.10
192.168.1.10
192.168.1.10
192.168.1.11
192.168.1.15
192.168.1.15
192.168.1.20
192.168.1.22

then i hope to get the result like this:

192.168.1.10 : 3
192.168.1.11 : 1
192.168.1.15 : 2
192.168.1.20 : 1
192.168.1.22 : 1

the number in second column is how many times it appears in the file.

use sort , and uniq to get how many unique record in this file
use grep with wc -l command to get how many times it appears

is there any better to do so ?? any advice?

Thanks.

jaduks · May 17, 2010, 6:42am

Something like this ?

$ awk '{count[$1]++}END{for(j in count) print j":"count[j]}' file.txt

fpmurphy · May 17, 2010, 10:59am

Jaduks awk solution does not sort the output as requested by the OP. This can be remedied by piping the output to sort.

You can do it completely within a shell. For example using ksh93 the following script

#!/bin/ksh93

typeset -A count

# create an associative array
while read ip
do
    (( count[$ip]++ ))
done < infile


# sort print the associative array
while (( 1 ))
do
   (( !${#count[@]} )) && break;
   k=(${!count[@]})
   for j in ${!count[@]}
   do
      (( ${count[$j]} > ${count[$k]} )) && k=$j
   done
   echo "$k : ${count[$k]}"
   unset count[$k]
done

and produces the following sorted list

192.168.1.10 : 3
192.168.1.15 : 2
192.168.1.20 : 1
192.168.1.11 : 1
192.168.1.22 : 1

Associate arrays are also supported in bash V4 but use a slightly different syntax.

shahhe · May 17, 2010, 4:15pm

Here is another simple solution:

for i in $(cat file.txt)
do
  echo "$i: $(grep -c $i file.txt)"
done | sort -u

rugdog · May 17, 2010, 7:01pm

sort file|uniq -c

although the output is different, but you could do, to get the desired format:

sort file|uniq -c|awk '{print $2 " : " $1}'

rdcwayx · May 17, 2010, 8:24pm

sort in awk:

 awk '{count[$1]++}END{for(j in count) print j":"count[j] |"sort -t: -k2r"}' urfile

curleb · May 17, 2010, 8:30pm

I like shahhe's offering for quickest and most likely to become a one-liner, except that it needs one (or two) last tweak to get the sort requested by the OP:

for i in $(<file.txt)
do
  echo "$i: $(grep -c $i file.txt)"
done | sort -u -t. -k4