Hello all,
I want to consolidate my data group by cities into 4 rolled up code categories (first 2 characters of 2nd col), namely PR, GR, TY and others.
CHICAGO PR1
CHICAGO GR3
CHICAGO GR4
CHICAGO HT5
CHICAGO HT6
CHICAGO TY5
HOUSTON PR3
HOUSTON PR9
HOUSTON TY4
HOUSTON HJ5
DESMOINES PR5
DESMOINES PR6
DESMOINES GR7
DESMOINES HT7
Now I want to have
PR GR TY OTHERS
CHICAGO PR1 GR3|GR4 TY5 HT5|HT6
HOUSTON PR3|PR9 TY4 HJ5
DESMOINES PR5|PR6 GR7 HT7
I tried to achieve this by the following code
awk -F"\t" '{if(a[$1]){a[$1]=a[$1]"|"$2} else { a[$1]=$2}} END {for (i in a) {print i"\t"a}}'
but I`m not sure how to include PR, GR , TY and all others separately. Please help.
You could try something like:
awk '
BEGIN { print "\tPR\tGR\tTY\tOTHERS"
grp["PR"]; grp["GR"]; grp["TY"]
}
{ if(!((g = substr($2, 1, 2)) in grp))
g = "OTHERS"
if(!($1 in city))
citylist[++cc] = city[$1] = $1
data[$1, g] = data[$1, g] == "" ? $2 : data[$1, g] "|" $2
}
END { for(i = 1; i <= cc; i++)
printf("%s\t%s\t%s\t%s\t%s\n", citylist,
data[citylist, "PR"], data[citylist, "GR"],
data[citylist, "TY"], data[citylist, "OTHERS"])
}' file
which will work even if the input lines for a given city are not all contiguous. If the data for all cities are grouped on contiguous lines (as in your sample data), this script could be simplified.
This code produces the output you said you wanted except it does not put a space in the heading line after the GR
before the tab field separator.
As always, if you want to try this on a Solaris/SunOS system, change awk
to /usr/xpg4/bin/awk
or nawk
.
2 Likes