How to count Unique Values from a file.

Prega · January 13, 2011, 4:12am

Hi

I have the following info in a file -

      <Cell id="25D"/>
      <Cell id="26A"/>
      <Cell id="26B"/>
      <Cell id="26C"/>
      <Cell id="27A"/>
      <Cell id="27B"/>
      <Cell id="27C"/>
      <Cell id="28A"/>

I would like to know how would you go about counting all unique values within this file.I have limited knowledge but I think I should follow
these steps - Search the file for the occurance of the word "cell Id" than obtain the value "25D" than go through a conditon to see if value is unique and add to counter and display count.

I dont know were to start.

Any help or explanation would highly appreciative

Regards

anurag.singh · January 13, 2011, 4:15am

Following should help:

Prega · January 13, 2011, 4:32am

Hi Anurag.

Many Tanks for the speedy response.I have seen this thread but i'm not sure how the loop works.Please could help me understand as I have 3 more question that is of similar type which I than can tackle.

awk '/^cell id/{if(!a[$NF]) cnt++;a[$NF]++;next}END{print cnt}' inputFile

Franklin52 · January 13, 2011, 4:41am

This should be sufficient:

awk '{a[$0]++}END{for(i in a)if(a==1)print i}' file

anurag.singh · January 13, 2011, 4:56am

Use Franklin52's soln if input file is exactly like

<Cell id="Some_Value"/>

In case you input differs a little like

<Cell id="Value1"/>
<Cell id="Value1" dfd="ddfd" dfdsfdsf dff"/>

Then use soln in post #4 in above link which is:

awk -F\" '/Cell id/{if(!a[$2]) cnt++;a[$2]++;next}END{print cnt}' inputFile

It increments cnt value when a new Cell id value is found (Every Cell id value is stored in array once found and if a Cell id value is not found in array, means 1st occurance, then cnt is incremented).