Help with ... Formating the file using awk

I have a file like below

position1                   0          7802            7802            0          client1                                                 -                              -

position1                   8          8032            8032            0          client1                                                 -                              -

position1                   4          7761            7761            0          client1                                                                                            -                              -

position1                   9          7766            7766            0          client2                                                 -                              -

position2                   1          7796            7796            0          client3                                                 -                              -

position2                   3          8032            8032            0          client4                                                 -                              -

position2                   2          8123            8123            0          client4                                                 -                              -

position3                   6          7804            7804            0          client5                                                 -                              -

position3                   7          7890            7890            0          client5                                                 -                              -

position3                   5          7801            7801            0          -                                                       -                              -

how could i format this file like below

unique(column1), count(column2) -->  position1, 4

                                                                     position2, 3

                                                                     position3, 3
unique(column1), count(column6) -->  position1, 4

                                                                     position2, 3

                                                                     position3, 2
unique(column1), unique(column6)), count(unique(column6))) -->         position1,client1, 3

                                                                                                                                position1,client1, 1

                                                                                                                                position2,client3, 1

                                                                                                                                position2,client3, 2

                                                                                                                                position3,client5, 2

                                                                                                                                position3,-, 1

You know this already: Any attempts / ideas / thoughts from your side?

How are the data produced? Can you influence / modify the creator (of those data, of course!)?

Thanks RudiC. I tried this to count the columns but just couldn't relate how to combine multiple columns/rows at one.

awk -F"[ :\t]+" 'NR > 2 {A[$1]++}END{for(i in A)print i,A}'

Nope I can't modify or influence the input data

It REALLY were helpful IF you could verbally, carefully, detailedly describe your request and not leave it to the people in here to guess what you want from inconsistent samples! There's no position2 with a "-" in field 6, and it's nowhere told that "-" in $6 must not count like "client5" does. And, with the NR > 2 applied to your sample, the "position1" count is one to low.

Try these essays, all based on and adapted from your attempt above, on your problems and report back:

awk -F"[ :\t]+" 'NF {A[$1]++}END{for(i in A)print i,A}' OFS=, file4
position1,4
position2,3
position3,3
awk -F"[ :\t]+" 'NF && $6 != "-" {A[$1]++}END{for(i in A)print i,A}' OFS=, file4
position1,4
position2,3
position3,2
awk -F"[ :\t]+" 'NF {A[$1,$6]++}END{for(i in A)print i,A}' OFS=, SUBSEP=, file4
position3,-,1
position3,client5,2
position2,client4,2
position2,client3,1
position1,client2,1
position1,client1,3
1 Like

thanks, it was typo that no position2 with a "-" in field 6 ..let me try it out first and next time I will take care of it.

---------- Post updated at 07:30 AM ---------- Previous update was at 02:58 AM ----------

I have tried to change the below format o/p to something like

awk -F"[ :\t]+" 'NF {A[$1,$6]++}END{for(i in A)print i,A}' OFS=, SUBSEP=, file4
position3,-,1
position3,client5,2
position2,client4,2
position2,client3,1
position1,client2,1
position1,client1,3 

this

positiondetails,position=position3,client=- <space> count=1
positiondetails,position=position3,client=client5<space> count=2
positiondetails,position=position2,client=client4<space> count=2
positiondetails,position=position2,client=client3<space> count=1
positiondetails,position=position1,client=client2<space> count=1
positiondetails,position=position1,client=client1<space> count=3 
awk -F"[ :\t]+" 'NF {A[$1,$6]++}END{for(i in A){count=A; split(i, w, ",") ; for (j in w) printf("positiondetails,position=%s count=%d", w[j], count}}' OFS=, SUBSEP=, file4

this prints everything one after another, but this is not what I would like to print, can anyone guide me thru? I hope this makes sense.

Try adding an \n to printf 's format string.

it's not working after adding \n after printf

awk -F"[ :\t]+" 'NF {A[$1,$6]++}END{for(i in A){count=A; split(i, w, ",") ; for (j in w) printf("positiondetails,position=%s,client=%s count=%d", w[j], count}}' OFS=, SUBSEP=, file4

PS I missed client in my previous statement. sorry again!!

So in-order to print like this

positiondetails,position=position3,client=client5<space> count=2 

below awk is not enough

awk -F"[ :\t]+" 'NF {A[$1,$6]++}END{for(i in A){count=A; split(i, w, ",") ; for (j in w) printf("positiondetails,position=%s,client=%s count=%d\n", w[j], count}}' OFS=, SUBSEP=, file4 
 -->>>it's complaining ran out for this client=%s count=%d

if I make change

awk -F"[ :\t]+" 'NF {A[$1,$6]++}END{for(i in A){count=A; split(i, w, ",") ; for (j in w) printf("positiondetails,position=%s count=%d", w[j], count}}' OFS=, SUBSEP=, file4 

then output is like this

positiondetails,position=position3,client=client5<space> count=2 
positiondetails,position=client5<space> count=2 

while I wanted to print

positiondetails,position=position3,client=client5<space> count=2 

Any direction?

Try

awk -F"[ :\t]+" 'NF {A[$1,$6]++}END{for(i in A){count=A; split(i, w, ","); printf("positiondetails,position=%s,client=%s count=%d\n", w[1], w[2], count)}}' OFS=, SUBSEP=, file
positiondetails,position=position3,client=- count=1
positiondetails,position=position3,client=client5 count=2
positiondetails,position=position2,client=client4 count=2
positiondetails,position=position2,client=client3 count=1
positiondetails,position=position1,client=client2 count=1
positiondetails,position=position1,client=client1 count=3