Count occurrence of column one unique value having unique second column value

angshuman · August 2, 2016, 7:50am

Hello Team,

I need your help on the following:

My input file a.txt is as below:

3330690|373846|108471
3330690|373846|108471
0640829|459725|100001
0640829|459725|100001
3330690|373847|108471

Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are different. I am trying to get following output:

2 3330690
1 0640829

Following is what I tried:

awk -F'|' '{print $1}' a.txt | sort -n | uniq | grep -F -f - a.txt | awk -F'|' '{pritn $2}' | sort | uniq

This gives me following result:

373846
459725
373847

~

But it does not tell me how many times distinct column 1 value is occurring due to distinct value of column 2

Your help is highlu appreciated

Thanks
Angsuman

RavinderSingh13 · August 2, 2016, 8:24am

Hello angshuman,

I am not at all sure about your Input_file and expected output. As you are saying column 1 and column 2 should be common then it shouldn't be that count which you have posted.

awk -F"|" '{A[$1 FS $2]++} END{for(i in A){print A FS i}}'  Input_file

Output will be as follows.

2|0640829|459725
2|3330690|373846
1|3330690|373847

Above takes field 1st and field 2nd as an index into array. If your requirements are different then please post complete conditions with expected results.

Thanks,
R. Singh

RudiC · August 2, 2016, 8:40am

Your specification is far from clear. Would this do what you request:

awk -F"|" '!T[$1,$2]++ {C[$1]++} END {for (c in C) print C[c], c}' file
1 0640829
2 3330690

angshuman · August 2, 2016, 11:18am

Thank you Ravinder for your response. Sorry if I question is not clear.

Condition 1 - Unique value of column one which are 3330690 and 0640829
Condition 2 - Unique value of column one 3330690 is associated with 2 distinct value of column 2 which are 373846 and 373847. The unique value of column 1 which is 0640829 is associated with unique value of column 2 which is 459725.

Hence output is expected as below

2 3330690 
1 459725

Hope this clarifies.

---------- Post updated at 08:48 PM ---------- Previous update was at 06:11 PM ----------

Thank you RudiC. This worked perfectly. Now I am trying to understand this piece of code.
Can you please help explaining the code?

Thanks
Angsuman

RudiC · August 2, 2016, 11:27am

If the index constructed from $1 and $2 does not exist in the temp array T, its a new combination, and the counter for $1 is incremented. When the input file ends, all these counters and the corresponding $1 values are printed.

More detailed:
For the first occurrence of the $1,$2 combination, T[$1,$2] doesn't exist, so !T[$1,$2] is true, and the counter C[$1] is incremented. Due to the increment of T , the next time the combination is encountered, nothing will happen. C[$1] thus counts up the different $2 s for every single $1 . In the end, the count for every single $1 is printed.