Counting duplicate entries in a file using awk

sajal.bhatia · October 13, 2010, 7:59pm

Hi,

I have a very big (with around 1 million entries) txt file with IPv4 addresses in the standard format, i.e. a.b.c.d

The file looks like

10.1.1.1
10.1.1.1
10.1.1.1
10.1.2.4
10.1.2.4
12.1.5.6
.
.
.
.

and so on....

There are duplicate/multiple entries for some IP addresses. I want an awk/sed script (since the file is too big) to count the number of time each IP is repeated and print (write to the output file) in the following format:

10.1.1.1 3
10.1.2.4 2
12.1.5.6 1
.
.
.

and so on...

Any help would be highly appreciated.

Thanks !

Chubler_XL · October 13, 2010, 8:06pm

Is file sorted? Have you considered "uniq -c"?

danmero · October 13, 2010, 8:07pm

Using awk

awk 'NF{a[$NF]++}END{for(i in a)print i,a}' file | sort

sajal.bhatia · October 13, 2010, 8:08pm

No the file is not sorted !

Thanks !!