awk to count duplicated lines

ux4me · September 24, 2010, 4:56pm

We have an input file as follows:

2010-09-15-12.41.15
2010-09-15-12.41.15
2010-09-15-12.41.24
2010-09-15-12.41.24
2010-09-15-12.41.24
2010-09-15-12.41.24
2010-09-15-12.41.25
2010-09-15-12.41.26
2010-09-15-12.41.26
2010-09-15-12.41.26
2010-09-15-12.41.26
2010-09-15-12.41.26
2010-09-15-12.41.28
2010-09-15-12.41.28
2010-09-15-12.41.28
2010-09-15-12.41.28
2010-09-15-12.41.41

And we have this loop which works fine to count and print the line recurrences, i.e.:

for i in `cat infile | uniq`
        do
        num=`cat infile | grep $i | wc -l`
        echo $i $num
        done

However, would like to use the awk program to perform the similar logic. Please assist if possible and thanking you in advance.

jim_mcnamara · September 24, 2010, 5:05pm

awk 'arr[$0]++  END {for (i in arr) { if(arr>1]) {print arr, "    ", $0 }}' inputfile | sort -n

This produces a list of lines that occur more than once, with a count of the number of times they occur.

anbu23 · September 24, 2010, 5:12pm

Or

sort file | uniq -c

danmero · September 24, 2010, 5:16pm

Should be something like:

awk '{a[$0]++}END{for(i in a){print i, a}}' file

---------- Post updated at 05:16 PM ---------- Previous update was at 05:13 PM ----------

or if you need only duplicate count

awk '{a[$0]++}END{for(i in a){if(a-1)print i,a}}' file