Pivoting values from column to rows

Booo · April 13, 2017, 6:27pm

I/P:
I/P:

2017/01/01 a 10
2017/01/01 b 20
2017/01/01 c 40
2017/02/01 a 10
2017/02/01 b 20
2017/02/01 c 30

O/P:

                     a      b    c
2017/01/01     10   20   40
2017/02/01     10    20   30

Don_Cragun · April 13, 2017, 7:18pm

What operating system and shell are you using?

What have you tried to solve this problem on your own?

Booo · April 13, 2017, 8:01pm

I am using Linux and I don't know how to accomplish this one.

Don_Cragun · April 14, 2017, 12:11am

You didn't answer the question about what shell you're using. The following was written and tested using a Korn shell, but should work with any shell that is based on Bourne shell syntax (e.g., ash , bash , dash , ksh , and zsh ; but not csh and its derivatives):

#!/bin/ksh
awk -v debug=$# '
BEGIN {	OFS = "\t"
}
!($1 in d) {
	# We have a new date; save it for use in output headings.
	date[++nd] = $1
	d[$1] = nd
	if(debug)
		printf("date[%d]=%s & d[%s]=%d added\n", nd, $1, $1, nd)
}
!($2 in h) {
	# We have a new output column heaer; save it for use in output headings.
	head[++nh] = $2
	h[$2] = nh
	if(debug)
		printf("head[%d]=%s & h[%s]=%d added\n", nh, $2, $2, nh)
}
{	# Save the data from this input row for output later.
	row[$1, $2] = $3
	if(debug)
		printf("row[%s, %s]=%s added\n", $1, $2, $3)
}
END {	# Print header line.
	printf(OFS OFS)
	for(i = 1; i <= nh; i++)
		printf("%s%s", head, (i == nh) ? ORS : OFS)
	# Print accumulated data rows.
	for(i = 1; i <= nd; i++) {
		printf("%s%s", date, OFS)
		for(j = 1; j <= nh; j++)
			printf("%s%s", row[date, head[j]],
			    (j == nh) ? ORS : OFS)
	}
}' file

If you invoke this script without operands, it produces the output:

		a	b	c
2017/01/01	10	20	40
2017/02/01	10	20	30

if the file named file contains the sample input you provided in post #1 in this thread. If you invoke it with one or more operands, it provides debugging information showing how the arrays used to control the output are loaded from the input file in addition to producing the desired results:

date[1]=2017/01/01 & d[2017/01/01]=1 added
head[1]=a & h[a]=1 added
row[2017/01/01, a]=10 added
head[2]=b & h=2 added
row[2017/01/01, b]=20 added
head[3]=c & h[c]=3 added
row[2017/01/01, c]=40 added
date[2]=2017/02/01 & d[2017/02/01]=2 added
row[2017/02/01, a]=10 added
row[2017/02/01, b]=20 added
row[2017/02/01, c]=30 added
		a	b	c
2017/01/01	10	20	40
2017/02/01	10	20	30

If someone else wants to try this on a Solaris/SunOS system, change awk in this script to /usr/xpg4/bin/awk or nawk .

RudiC · April 14, 2017, 2:34am

Try also

awk     '
        {LN[$1]; HD[$2]; MX[$1,$2]=$3}

END     {               printf "%10s", ""; for (i in HD) printf "%10s", i; print "";
         for (j in LN) {printf "%10s",j;   for (i in HD) printf "%10s", MX[j,i]; print ""}
        }
' file
                   a         b         c
2017/02/01        10        20        30
2017/01/01        10        20        40

Booo · April 14, 2017, 2:19pm

Thanks and It worked fine as long as the rows and columns are same.

if the seventh line contains "2017/02/01 d 70" then it fails.

Don_Cragun · April 14, 2017, 6:30pm

In what way did my suggestion fail? If I change the last line of your sample file as suggested in post #6, I get the following output from the code I suggested in post #4:

		a	b	c	d
2017/01/01	10	20	40	
2017/02/01	10	20		30

What output were you expecting?

PS: Note that the code I suggested prints out columns in the order in which input field 2 values were first seen and prints out rows in the order in which input field 1 values were first seen.

The simpler and faster code RudiC suggested prints rows and columns in random order (but the values in each column should be consistent).

RudiC · April 15, 2017, 1:17am

Output from "my" proposal:

2017/02/01        10        20        30        70
2017/01/01        10        20        40

So - please explain in detail WHAT goes wrong in WHICH way. "then it fails" doesn't really help.

Don_Cragun · April 15, 2017, 1:46am

don cragun:

In what way did my suggestion fail? If I change the last line of your sample file as suggested in post #6, I get the following output from the code I suggested in post #4:
		a	b	c	d
2017/01/01	10	20	40	
2017/02/01	10	20		30
What output were you expecting?

PS: Note that the code I suggested prints out columns in the order in which input field 2 values were first seen and prints out rows in the order in which input field 1 values were first seen.

The simpler and faster code RudiC suggested prints rows and columns in random order (but the values in each column should be consistent).

Booo,
I apologize... The output above was produced by changing the 6th row in your sample input file to the line you provided in post #6, instead of by adding that line as a new 7th row. When I add that new line to your original sample data, the output produced by my suggestion is:

		a	b	c	d
2017/01/01	10	20	40	
2017/02/01	10	20	30	30

with my suggested code, with that input file, the output will always be in this order.

When I feed that input to the code RudiC provided, I get the output:

                   d         a         b         c
2017/01/01                  10        20        40
2017/02/01        30        10        20        30

but the order of the last four columns and the order of the rows may vary with different versions of awk (because the order in which items in an array are processed by for(index in array_name) is not specified by the standards). Note that the output RudiC got using the version of awk on his system produced output with the order of the last four columns AND the order of the rows printed was different than what I got on my system, but the numeric values in the table of output values displayed was identical using both my suggestion and his suggestion (on both of our systems). Although it may be hard to tell looking at the output he provided in post #8 since he didn't include the header line in the output.

As RudiC said, just saying that "it fails" without specifying whose suggested code you're running and without specifying what is wrong with the output produced is not at all helpful.

RudiC · April 15, 2017, 1:49am

I apologize - just didn't capture the entire output, which reads

                   a         b         c         d
2017/02/01        10        20        30        70
2017/01/01        10        20        40

Booo · April 15, 2017, 4:34am

Can we replace the empty spaces in the column to zero. If its empty while sending as an excel format the empty column is copied by adjacent column data.

Don_Cragun · April 15, 2017, 5:27am

Yes. You can do that by changing 1 character in my suggestion or by adding 2 characters (4 if you add spaces around arithmetic operators) to either of our suggestions.

Can we know which of our suggestions you're using?

Can we know what fails in the suggestions that have been provided for you? (In other words, please explain in what way the suggestions you were given did not do what you requested in the requirements you stated before adding your new requirement in post #11?)

Can you try to modify the code we have supplied instead of expecting us to change our code for you every time you change your requirements? If you can't get it to work, show us what you have a tried an we'll be glad to help you finish it.

Booo · April 15, 2017, 5:28pm

Thanks Don. I tried to change the OFS value but it didn't work. I am not sure how to fix this one.

Don_Cragun · April 15, 2017, 5:51pm

I asked three questions in post #12 in this thread. You answered one of them (although I have no idea why you thought changing the output field separator would change the value of a field from an empty string into a numeric string with value 0). We know that you understand how both of our suggestions work (since you're asking questions here to learn how to solve your problem and you haven't asked any questions about how the suggested awk scripts work). At least we hope that you understand that this forum is here to help you learn how to use BSD, Linux, and UNIX system shell scripting to solve your own problems; not to act as your unpaid programming staff.

If you're unwilling to tell us how our suggestions failed to meet your stated requirements and you aren't even willing to tell us which of our suggestions you're trying to get to work, why should we waste any more time trying to help you?

Please help us help you by answering the questions we have asked!

Booo · April 15, 2017, 6:51pm

Never mind and I figure it out easily by using sed replace command.

drl · April 15, 2017, 7:15pm

Hi, Booo.

Please post your solution to help others that may encounter a similar problem.

We try to be a community so that we can help one another.

Thanks ... cheers, drl

Booo · April 15, 2017, 7:19pm

I used

sed 's/          /         0/g' filename

Don_Cragun · April 15, 2017, 8:39pm

Hi Booo,
Thank you for sharing your solution with us. If that solution works, we now know that you're using the code RudiC suggested. Instead of adding your sed command to post-process the output produced by RudiC's suggested awk script, you might consider changing the print statement in his suggestion:

printf "%10s", MX[j,i]

to:

printf "%10s", MX[j,i] + 0

(which will convert any empty elements in the array to the numeric value 0 instead before printf see the string to be printed).

Alternatively, you could change it to:

printf "%10d", MX[j,i]

(which tells awk 's printf function to print a decimal value instead of a string and let's printf do the conversion from an empty string to a decimal value for you).

Booo · April 17, 2017, 1:40pm

Thank Don and Ron for your help. Solution you suggested worked fine without any issues.