Awk/sed script for transposing any number of rows with header row

tntelle · April 23, 2013, 7:07pm

Greetings!

I have been trying to find out a way to take a CSV file with a large number of rows, and a very large number of columns (in the thousands) and convert the rows to a single column of data, where the first row is a header representing the attribute name and the subsequent series of rows contains the value.

For instance:

In my CSV:

Row1: (header) Letter, Weight, Color, Cost
Row2: A, 20, Blue, 5
Row3: DD, 200, Orange, 100
...  (and so forth)

I am trying to get the output to be:

Letter,A
Weight,20
Color,Blue
Cost,5

Letter,DD
Weight,200
Color,Orange
Cost,100

I found this awk code is useful:

BEGIN {FS=OFS=","}

{
for (i=1;i<=NF;i++)
{
 arr[NR,i]=$i;
 if( big <=  NF)
  big=NF;
 }
}

END {
  for(i=1;i<=big;i++)
   {
    for(j=1;j<=NR;j++)


    printf("%s%s",arr[j,i], (j==NR ? "" : OFS));
    #printf("%s%s",arr[j,i], (j==NR ? "" : OFS));
    print "";
}
}

but this combines the values:

Letter,A,DD
Weight,20,200
Color,Blue,Orange
Cost,5,100

So in the end, i want to keep them completely separate. I am sure the answer is simple, but I am new to awk/sed and am having some difficulties figuring out what the trouble is. Anybody that can help me out, I would learn tremendously from the example. Thank you in advance!!

Yoda · April 23, 2013, 11:17pm

An awk program:

awk -F, '
        NR == 1 {
                        gsub ( " ", X )
                        split ( $0, H, "," )
        }
        NR > 1 {
                        for ( i = 1; i <= NF; i++ )
                                print H, $i
                        printf "\n"
        }
' OFS=, file.csv

tntelle · April 24, 2013, 4:50pm

Thank you Yoda! This worked like a charm. There was something I forgot to mention that you'll probably know how to address... what if one of the values in the header is something like "PLANET" and I want to tack on whatever that planet is to the front of every line. So for instance:

Row1: (header) Letter, Weight, Color, Cost, PLANET
Row2: A, 20, Blue, 5, MARS
Row3: DD, 200, Orange, 100, REMULAK
... (and so forth)

We now get:

MARS, Letter,A
MARS, Weight,20
MARS, Color,Blue
MARS, Cost,5

REMULAK, Letter,DD
REMULAK, Weight,200
REMULAK, Color,Orange
REMULAK, Cost,100

Thank you Master...

Yoda · April 24, 2013, 5:01pm

awk -F, '
        NR == 1 {
                        gsub ( " ", X )
                        split ( $0, H, "," )
        }
        NR > 1 {
                        for ( i = 1; i <= (NF-1); i++ )
                                print $NF, H, $i
                        printf "\n"
        }
' OFS=, file.csv