Unique count from flat file

Pratik4891 · December 30, 2011, 1:17am

Hello Guys

I have a flat file with '|~|' delimited
When I use to record count using below command

awk -FS"[|~|]+" ' {print $colno}' filename | wc -l

the count is fine

But when I am trying to find the unique number of record the o/p is always 1

awk -FS"[|~|]+" ' {print $colno}' filename |sort|uniq|wc -l

Please let me know how to find the unique record count

balajesuri · December 30, 2011, 1:36am

Field separator doesn't matter while finding the count of unique lines, does it?

Try:

sort filename | uniq | wc -l

Pratik4891 · December 30, 2011, 1:51am

Thanks but I need to know the uniq record on particular column

balajesuri · December 30, 2011, 2:09am

Please provide a sample input and expected output.

Pratik4891 · December 30, 2011, 4:57am

I/P file

 
 
9961881|~|20111229|~|000000218311635|~|1015104|~|000192170510|~|1|~|1|~||~|1|~|3755593|~|3755593|~|218311635
9961881|~|20111229|~|000000218311636|~|1015104|~|000192170510|~|1|~|1|~||~|1|~|3755593|~|3755593|~|218311636
9961881|~|20111229|~|000000218312203|~|1014486|~|000192174061|~|1021|~|1|~||~|1|~|90875|~|90875|~|218312203
9961881|~|20111229|~|000000218312204|~|1014486|~|000192174061|~|1267|~|1|~||~|1|~|90875|~|90875|~|218312204
9961881|~|20111229|~|000000218478637|~|1023353|~|000192465057|~|253|~|1|~||~|1|~|3755593|~|3755593|~|218478637
9961881|~|20111229|~|000000218478639|~|1023353|~|000192465057|~|801|~|1|~||~|1|~|3755593|~|3755593|~|218478639
9961881|~|20111229|~|000000218478640|~|1023353|~|000192465057|~|802|~|1|~||~|1|~|3755593|~|3755593|~|218478640
9961881|~|20111229|~|000000218478641|~|1023353|~|000192465057|~|253|~|1|~||~|1|~|3755593|~|3755593|~|218478641
9961881|~|20111229|~|000000218478642|~|1023353|~|000192465057|~|801|~|1|~||~|1|~|3755593|~|3755593|~|218478642
9961881|~|20111229|~|000000218478643|~|1023353|~|000192465057|~|802|~|1|~||~|1|~|3755593|~|3755593|~|218478643

Need uniq record count number on 4th field

 
awk -FS"[|~|]+" ' {print $4}' test.dat|sort|uniq|wc -l

o/p 1 which should be 3

if I change the FS to | in source file the o/p is 3

Klashxx · December 30, 2011, 5:07am

awk -F\~ '{print $4}' test.dat|sort|uniq|wc -l

balajesuri · December 30, 2011, 5:09am

Its awk -F not awk -FS

$ awk -F"[|~|]+" '{print $4}' filename | sort | uniq | wc -l
3

ahamed101 · December 30, 2011, 5:09am

awk -F\~ '{print $4}' test.dat | sort -u | wc -l

--ahamed

Klashxx · December 30, 2011, 5:15am

a pure awk:

awk -F"[|~|]+" 'a[$4]==""{a[$4]=1;b++}END{print b}' test.dat

Or:

awk -F\~ '!a[$4]{a[$4]=1;b++}END{print b}'  test.dat

ahamed101 · December 30, 2011, 5:18am

Yet another one...

awk -F'~' '{a[$4]++}END{print length(a)}' infile

Use nawk if solaris!

--ahamed

itkamaraj · December 30, 2011, 5:56am

$ awk -F"\|~\|" '{a[$4]++;next}END{for(i in a){print a,i}}' input.txt
6 1023353
2 1014486
2 1015104

Pratik4891 · January 6, 2012, 4:35am

Thanks a lot guys for all the answers .
thanks a lot for your valuable time