Cross Tab

radius · April 18, 2016, 1:14pm

I need help to do cross tab

file input

20160101|ASIA|CHINA|2000
20160101|ASIA|INDIA|3000
20160102|ASIA|CHINA|4000
20160103|ASIA|CHINA|2000
20160103|AFRIKA|ZAMBIA|2000
20160104|ASIA|CHINA|5000

expected output

CONTINENT|NATION|20160101|20160102|20160103|20160104
ASIA|CHINA|2000|4000|2000|5000
ASIA|INDIA|3000|0|0|0
AFRIKA|ZAMBIA|0|0|2000|0

I did this

awk '
{
	for(i=1;i<=NF;i++)
		print $1,head[i-1],$i
}' input | awk -F'|' '{print ""CONTINENT""|""NATION""|"$0

Don_Cragun · April 18, 2016, 5:28pm

You might want to try something more like:

awk '
BEGIN {	FS = OFS = "|"
}
{	# Gather data:
	if(!($1 in dates))
		out_field[dates[$1] = ++nd] = $1
	continent_nation[$2 OFS $3]
	data[$2 OFS $3, $1] += $4
}
END {	# Print header line:
	printf("%s", "CONTINENT" OFS "NATION")
	for(i = 1; i <= nd; i++)
		printf("%s", OFS out_field)
	print ""

	# Print accumualted data lines:
	for(loc in continent_nation) {
		printf("%s", loc OFS)
		for(i = 1; i <= nd; i++)
			printf("%s%s", data[loc, out_field] + 0,
			    (i < nd) ? OFS : ORS)
	}
}' input

which produces output similar to:

CONTINENT|NATION|20160101|20160102|20160103|20160104
ASIA|CHINA|2000|4000|2000|5000
AFRIKA|ZAMBIA|0|0|2000|0
ASIA|INDIA|3000|0|0|0

with your sample input file (although the order of the output lines after the header may vary with different versions of awk ).

If you want to use this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .