converting rows into columns

svajhala · January 2, 2011, 6:40pm

Hi,

I am trying to fetch some values from db and spooling the output to a file.

when i query the db for the values, i get the values in following format.

PC_1 wf_test1 Test
PC_2 wf_test2 Test
PC_3 wf_test3 Test

But my spool file was created in following format.

PC_1
wf_test1
Test

PC_2 
wf_test2
Test

PC_3
wf_test3
Test

Now i want to convert the format back to my db format in my spool file. Please let me know how to do it?

-Sam

DGPickett · January 2, 2011, 7:16pm

Not really rows to columns = twirling a matrix on the diagonal, so sed can pile them back up:

sed '
:loop
$b
N
s/\n$//
t
s/\n/ /
t loop

Narrative: sed:

set a branch target named loop,
if at EOF branch (to end = print and exit),
get next line onto end of buffer,
if it is a blank field, remove it,
if we removed it, branch (to end = print, delete and start new line),
make the line feed a space
go back to loop (clearing the sustitute test flag at the same time).

Leo_Gutierrez · January 2, 2011, 7:28pm

#!/usr/bin/perl
open(my $file, q[<], "file.txt") or die("$!");
while(my $line = <$file>)
{
	$line =~ s/\s+/\n/g;
	print $line, "\n";
}
close($file);

Ygor · January 2, 2011, 8:49pm

Try...

paste - - - - < file1

svajhala · January 2, 2011, 11:56pm

dgpickett:

Not really rows to columns = twirling a matrix on the diagonal, so sed can pile them back up:
sed '
:loop
$b
N
s/\n$//
t
s/\n/ /
t loop
Narrative: sed:

set a branch target named loop,

if at EOF branch (to end = print and exit),

get next line onto end of buffer,

if it is a blank field, remove it,

if we removed it, branch (to end = print, delete and start new line),

make the line feed a space

go back to loop (clearing the sustitute test flag at the same time).

Thanks for the explanation. suppose my spool filename is spool.out, where do I need to define this filename in the above code?

Scrutinizer · January 3, 2011, 12:27am

awk '$1=$1' RS= file

DGPickett · January 3, 2011, 12:18pm

It's a transform, just add on the end: < input_file > output_file

The paste bit is cute, too. I try to use sed rather than know the varying options of many one-purpose commands, so my sed knowledge and expertise grows. The sed skills support command line and vi editing as well as grep and the other regex tools. Similar arguments apply to PERL, which is more a language than a scripting tool.

ahmad.diab · January 3, 2011, 1:06pm

use below:

form line to paragraphs:

perl -wlne '$_ =~ s/\s+/\n/g;print $_,"\n";' infile.txt > outfile.txt

for reverse:
from paragraphs to lines:

perl -00 -wne 's/\n/\t/g;print $_,"\n"'  infile.txt > outfile.txt

Franklin52 · January 3, 2011, 1:44pm

xargs -n3 < file

DGPickett · January 3, 2011, 8:58pm

A bit fork/exec intensive, but yes, that is another way to collect N lines into one:

xargs -n4 echo < file

svajhala · January 4, 2011, 4:40pm

when i try your command, it is giving me the following output.

PC_1
wf_test1
Test
PC_2
wf_test2
Test
PC_3
wf_test3
Test

but what i want is

PC_1 wf_test1 Test
PC_2 wf_test2 Test
PC_3 wf_test3 Test

---------- Post updated at 04:40 PM ---------- Previous update was at 04:31 PM ----------

It gives the following output.

PC_1 wf_test1 Test PC_2
wf_test2 Test PC_3 wf_test3
Test

But I want to start output in next line after encountering the word 'Test'

ex:

PC_1 wf_test1 Test 
PC_2 wf_test2 Test 
PC_3 wf_test3 Test

Scrutinizer · January 4, 2011, 5:00pm

svajhala:

when i try your command, it is giving me the following output
PC_1
wf_test1
Test
PC_2
wf_test2
Test
PC_3
wf_test3
Test
but what i want is

[CODE]
PC_1 wf_test1 Test
PC_2 wf_test2 Test
PC_3 wf_test3 Test

It only works when there are empty lines between the records, as is the case in the sample input file.

DGPickett · January 4, 2011, 5:10pm

Back in #1 there were blank lines so 4 was a good number.

Chubler_XL · January 4, 2011, 5:35pm

svajhala:

when i try your command, it is giving me the following output.
PC_1
wf_test1
Test
PC_2
wf_test2
Test
PC_3
wf_test3
Test
but what i want is
PC_1 wf_test1 Test
PC_2 wf_test2 Test
PC_3 wf_test3 Test

The supplied code only works with GNU awk for plain awk you might need something like this:

awk '!$0;$0{printf $0" "}END{print}' infile

Another version with extra space on end of each line removed:

awk '!$0{print; f=0}$0{printf(f++?" ":"")$0}END{if(f)print}' infile

Scrutinizer · January 4, 2011, 5:46pm

Really?

$ mawk '$1=$1' RS= infile685
PC_1 wf_test1 Test
PC_2 wf_test2 Test
PC_3 wf_test3 Test

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html\#tag\_20\_06\_13_03

Chubler_XL · January 4, 2011, 5:56pm

Scrutinizer,

It's not the null RS that's the issue it's the stripping of \n from $0 when assigning fields, here is my results using awk on AIX:

$ awk '$1=$1' RS= infile

PC_1
wf_test1
Test
PC_2 
wf_test2
Test
PC_3
wf_test3
Test

[LEFT]Additional debug:

$ awk '$1=$1{print "<"$0">"}' RS= infile

[/LEFT]

<PC_1

wf_test1
Test>
<PC_2 
wf_test2
Test>
<PC_3
wf_test3

Test>

===Update===

Funny enough on AIX 5.3 this does work:

awk '$1=$1""' RS= infile

[/LEFT]

Scrutinizer · January 4, 2011, 6:18pm

Then IMO that awk does not work according to the POSIX Specification because newline should turn into a field separator no matter what the value of FS, as per the previous quote in #15. and further:

awk

This accomplished by assigning $1 to $1. And by re-evaluating $0, the field separators are changed to OFS which is a single space.
Is there a nawk on AIX?

---------- Post updated at 00:18 ---------- Previous update was at 00:05 ----------

Hmm forcing into string context makes a difference with that AIX implementation. Perhaps that implementation has an 'optimization' that if you assign the same value is is wrongly assumed that nothing needs to be done and by adding the double quotes it is regarded as different? A bit fishy, no?

Chubler_XL · January 4, 2011, 6:18pm

Yep, it looks like AIX awk is incorrectly optimising this particular assignment, there is a /usr/bin/nawk but it's a hard link to awk and has the same issue.

Interesting, but I doubt this is going to be the OPs problem, unless he is also on AIX 5.3.

DGPickett · January 5, 2011, 8:18pm

When I was real serious about rotating I wrote this C tool:

colrot.c:

#include <stdio.h>
#include <string.h>

static char    buf[200000000];
static char    **bufs = NULL ;
static char    *cp, *cp2, *bcp ;
static char    *isep = "|\n" ;
static char    *osep = "\t" ;
static char    *bsep = "" ;
static int    rows = 1 ;
static int    cols ;
static int    col ;
static int    row ;
static int    bct = 0 ;

static int p_gets( char *b, size_t s )
{
    if ( !fgets( b, s, stdin ) )
    {
        if ( ferror( stdin ) )
        {
            perror( "stdin" );
            exit( 1 );
        }

        *b = NULL ;
        return 1 ;
    }

    return 0 ;
}

static void p_puts( char *b )
{
    if ( EOF == fputs( b, stdout ) )
    {
        if ( ferror( stdout ) )
        {
            perror( "stdout" );
            exit( 1 );
        }

        exit( 0 );
    }
}

main( int argc, char **argv )
{
    /* get $1 */
    for ( col = 1 ; col < argc ; col++ )
    {
        if ( !strcmp( argv[col], "-is" )
          && ++col < argc )
        {
            *isep = argv[col][0] ;
            continue ;
        }

        if ( !strcmp( argv[col], "-os" )
          && ++col < argc )
        {
            osep = argv[col] ;
            continue ;
        }

        if ( !strcmp( argv[col], "-bs" )
          && ++col < argc )
        {
            bsep = argv[col] ;
            continue ;
        }

        if ( ( rows = atoi( argv[col] ) ) > 0 )
        {
            continue ;
        }

        fputs(
"\n"
"Usage: colrot [ -is <i_sepc> ] [ -os <o_seps> ] [ -bs <b_seps> ] [ <lpb> ]\n"
"\n"
"Reads and saves a header line, and then reads <lpb> (default one) line\n"
"blocks, and writes the header and each block rotated, so each column is\n"
"on a line.  The character <i_sepc> (default pipe=|) defines input fields.\n"
"String <o_seps> (default a tab character) separates output fields.  A line\n"
"with string <b_seps> (default empty) separates output blocks.\n"
"\n"
            , stderr );
        exit( 1 );
    }

    /* get column name line */
    if ( p_gets( buf, sizeof( buf ) ) )
        exit( 0 );

    /* count columns */
    for ( cp = buf, cols = 0 ; *cp ; cols++ )
    {
        cp = strpbrk( cp, isep );
        cp++ ;
    }

    /* malloc a matrix of char pointers */
    if ( !( bufs = (char**)malloc( sizeof(char*) * cols * ( rows + 1 ))))
    {
        perror( "malloc()" );
        exit( 1 );
    }

    /* find and null terminate column names */
    for ( cp = buf, col = 0 ; col < cols ; col++ )
    {
        bufs[col] = cp ;
        cp = strpbrk( cp, isep );
        *cp = NULL ;
        cp++ ;
    }

    bcp = cp ;

    loop:

    /* read in N rows above header; rows + 1 total */
    for ( row = 1 ; row <= rows ; row++ )
    {
        /* ignore EOF until starting a new block */
        if ( p_gets( cp, sizeof(buf) - ( cp - buf ) ) )
        {
            if ( row == 1 ) /* nothing buffered */
            {
                exit( 0 );
            }
            /* shrink the last block and continue */
            /* we will read EOF again on row 1 of the next pass */
            rows = row - 1 ;
            break ;
        }

        /* locate and null terminate all columns, fake missing cols */
        for ( col = 0 ; col < cols ; col++ )
        {
            bufs[ col + ( row * cols ) ] = cp ;
            cp += strcspn( cp, isep );

            /* if sep found, null term field and step into next */
            switch ( *cp )
            {
            case 0:
                break ;
            case '\n':
                *cp = NULL ;
                break ;
            default:
                *cp = NULL ;
                cp++ ;
            }
        }

        /* write next line just past last field null */
        cp++ ;
    }

    /* go through matrix column and then reverse row,
        marking empty fields for supression */
    for ( col = 0 ; col < cols ; col++ )
    {
        for ( row = rows ; row > 0 ; row-- )
        {
            if ( *( bufs[ col + ( row * cols ) ] ) )
            {
                break ;
            }
    
            /* NULL pointer means nothing here or beyond */
            bufs[ col + ( row * cols ) ] = NULL ;
        }
    }

    if ( bct++ )
    {
        /* put a line between blocks */
        p_puts( bsep );
        p_puts( "\n" );
    }

    /* go through matrix column and then row, printing */
    for ( col = 0 ; col < cols ; col++ )
    {
        for ( row = 0 ; row <= rows ; row++ )
        {
            /* NULL pointer means nothing here or beyond */
            if ( !bufs[ col + ( row * cols ) ] )
            {
                break ;
            }

            if ( row )
            {
                p_puts( osep );
            }

            p_puts( bufs[ col + ( row * cols ) ] );
        }

        p_puts( "\n" );
    }

    /* start another block, reading just above header */
    cp = bcp ;
    goto loop ;
}

Scrutinizer · January 5, 2011, 11:55pm

Hi DGPickett, that looks like something nice to try with a future case of transposition, but despite the title of this thread, there is no converting columns into rows here, is there?