Search and count a unique string

Hi Guys,
I have a file as follows. Here is my story:
For each field, the string in the 5th column needs to be searched in other fields of the same column and counted if the 1st column of the field is different from that of the primary field. BTW, the unique strings of 1st column need to be considered. Sorry if this is too complicated. Let me clarify it with this example. Here is my file (tab delimited):

A1          1          15231          15232          ESR1
A1          1          15235          15236          ESR1
A2          1          15231          15232          ESR1
A3          1          15235          15236          BTW
A4          1          15235          15236          FKH
A5          1          15235          15236          FKH
A6          1          15235          15236          FKH

Now the counts are reported in a new column:

A1          1          15231          15232          ESR1          2
A1          1          15235          15236          ESR1          2
A2          1          15231          15232          ESR1          2
A3          1          15235          15236          BTW          1
A4          1          15235          15236          FKH          3
A5          1          15235          15236          FKH          3
A6          1          15235          15236          FKH          3

Thanks a lot in advance!

  1. Your 1st column in not unique (A1 appears in row #1 and row #2)
  2. From my understanding of your requirement, shouldn't the output be:
A3          1          15235          15236          BTW          0
A4          1          15235          15236          FKH          3
A5          1          15235          15236          FKH          3
A6          1          15235          15236          FKH          3

Yes, the first column is not unique but the search needs to be done on unique strings of the first column. For example, ESR1 is repeating in the first three fields. However, it should be reported two at the end since there are only 2 unique strings in the first column which have that. This should apply to other strings in the 5th column as well. Please let me know if this still doesn't make sense
(my first post was edited)

Read input file twice:

awk -F'\t' '
        NR == FNR {
                v = $1 FS $5
                if ( ! ( v in A ) )
                        C[$5]++
                A[v]
                next
        }
        {
                print $0 FS C[$5]
        }
' file file
1 Like

Thanks Yoda, it worked

---------- Post updated 01-29-14 at 03:58 PM ---------- Previous update was 01-28-14 at 07:29 PM ----------

Hey Yoda,
Sorry again but I have another issue now. For each line, I want the counts of the lines which have similar values in the 1st and 2nd column. Let's say I have a file like this:

2	131
2	131
3	131
4	150	
4	160
x	200
x	200

I need it to be reported as follows:

2	131	2
2	131	2
3	131	1
4	150	1	
4	160	1
x	200	2
x	200	2

I really appreciate if you solve this for me.

awk 'NR==FNR{A[$1,$2]++;next}{print $0,A[$1,$2]}' file file

Awesome, thanks Yoda!