Description of data:
NC_002737.1 4 F1VI4M001A3IAU F1VI4M001A3IAU F1VI4M001A3IAU F1VI4M001A3IAU
NC_006372.1 5 F1VI4M001BH0HY FF1VI4M001BH0HY F1VI4M001C0ZC5 F1VI4M001DOF2X F1VI4M001AYNTS
Every field in every record is tab separated
There can be "n" columns.
Problem:
What I want to achieve is following
NC_002737.1 4 F1VI4M001A3IAU 4
NC_006372.1 5 F1VI4M001BH0HY 2 F1VI4M001C0ZC5 1 F1VI4M001DOF2X 1 F1VI4M001AYNTS 1
So far this happening:
awk 'BEGIN{OFS="\t";cnt=0}{if (NF>3) {for (i=3;i<=NF;i++) {if ($(i)==$(i+1)) {cnt = cnt+1} print $1,$(i),cnt}}cnt=1}'
Output:
NC_002737.1 F1VI4M001A3IAU 2
NC_002737.1 F1VI4M001A3IAU 3
NC_002737.1 F1VI4M001A3IAU 4
NC_002737.1 F1VI4M001A3IAU 4
NC_006372.1 F1VI4M001BH0HY 2
NC_006372.1 F1VI4M001C0ZC5 1
NC_006372.1 F1VI4M001DOF2X 1
NC_006372.1 F1VI4M001AYNTS 1
I know the problem: For loop is going one by one and then it is printing the lines. At this I am not able to come up with a way to get the desired output (in one line)
as shown above.
Can any one suggest away. I can again write another awk script to do this, but I wondering if there is a way to get this fix all in go.
Thanks