Awk: output lines with common field to separate files

beca123456 · September 14, 2018, 7:56am

Hi,

A beginner one.

my input.tab (tab-separated):

h1	h2	h3	h4	h5
item1	grpA	2	3	customer1
item2	grpB	4	6	customer1
item3	grpA	5	9	customer1
item4	grpA	0	0	customer2
item5	grpA	9	1	customer2

objective:
output a file for each customer ($5) with the item number ($1) only if $2 matches 'grpA'.

outputs:
in 'customer1.tab'

item1
item3

in 'customer2.tab'

item4
item5

my command:

gawk '
BEGIN{
   FS=OFS="\t"
}
NR>1{
   if($2 ~ /grpA/){
      a[$5]=$1
   }
}
END{
   for(i in a){
      print i FS a >> i".tab"
   }
}' input.tab

What I get is only the last occurrence of the array value for each customer, even though I loop over the array:
in 'customer1.tab'

item3

in 'customer2.tab'

item5

There is clearly a problem to populate the array. I don't get it

RudiC · September 14, 2018, 8:10am

You are overwriting the respective array elements with every occurrence of the respective customer. Try

awk -F"\t" '
NR == 1         {next
                }

$2 ~ /grpA/     {a[$5]  = a[$5] DL[$5] $1
                 DL[$5] = ORS
                }

END             {for (i in a)   print a > (i ".tab")
                }
' file
cf cu*

---------- customer1.tab: ----------

item1
item3

---------- customer2.tab: ----------

item4
item5

As you don't need OFS , the field separator is set with the -F option.

beca123456 · September 14, 2018, 9:07am

Works great ! Thanks !