Converting to matrix-like file using AWK

Hi,
Needs for statistics, doing converting

Here is a sample file
Input :

           
1|A|17,94
1|B|22,59
1|C|56,93
2|A|63,71
2|C|23,92
5|B|19,49
5|C|67,58

expecting something like that
Output :

1|A|17,94|B|22,59|C|56,93
2|A|63,71|B|0|C|23,92
5|A|0|B|19,49|C|67,58

I couldn't figure out how to do it.
So if there's a way using awk,ksh,bash or perl, any help will be appreciated.

Thank's

If your input file is sorted such that the records with the same first field are always together, then this should work:

awk -F \| '
    $1 != last {
        if( hold )
            print hold;
        hold = $1;
    }
    {
        hold = hold "|" $2 "|" $3;
        last = $1;
    }
    END { print hold; }
' input-file >output-file
1 Like

Thank's agama,
I can sort the file even it's not (using 'sort' command)
i tried your solution, but i got this
Output :

1|A|17,94|B|22,59|C|56,93
2|A|63,71|C|23,92
5|B|19,49|C|67,58

it only misses the zeros when one of the column value (A,B,C) is missing

This will fill in the missing zeros. This doesn't require the input to be sorted, but it does require all entries in the input with the same value for the first field to be adjacent:

awk -F "|" 'BEGIN{reset()}
function doprint(){
        printf("%s|A|%s|B|%s|C|%s\n", last, val["A"], val["B"], val["C"])
        reset()
}
function reset(){
        val["A"] = val["B"] = val["C"] = 0
}
{       if($1 != last) if(last != "") doprint()
        last = $1
        val[$2] = $3
}
END{    if(last != "") doprint()
}' in
1 Like

My fault -- failed to see the zeros.

Have a go with this:

awk -F \|  '
    BEGIN { lastc = "C";  }      # change if there are more than three rows/type
    function print_row(     c, cv )
    {
        cv = 65;
        c = "A";
        printf( "%s|", last );
        while( c <= lastc )
        {
            if( last " " c in data )
                printf( "|%s|%s", c, data[last " " c] );
            else
                printf( "|%s|0", c );
            cv++;
            c = sprintf( "%c", cv );
        }
        printf( "\n" );
    }
    {
        if( $1 != last )
        {
            print_row();
            delete( data );
        }

        data[$1" "$2] = $3;
        last = $1
    }
    END { print_row(); }
' input >output

It assumes that the second field is a single character in the range of A-Z.

i really appreciate your help, this forum is really the best

sorry Agama, this one also doesn't work,
Output :

||A|0|B|0|C|0
1||A|17,94|B|22,59|C|56,93
2||A|63,71|B|0|C|23,92
5||A|0|B|19,49|C|67,58

Don Cragun's solution worked like a charm

i just wondered if we can make it work for a 2nd field having more than 3 values other than A,B or C .

A slight modification to Don's solution to meet your requirements:

awk -F "|" 'BEGIN{reset()}
function doprint(    c,i){
        printf("%s",last)
        for(i=65;i<=90;i++)
	{
	 c=sprintf("%c",i)
	 printf("|%s|%s",c,val[c])
	}
	printf "\n"
        #printf("%s|A|%s|B|%s|C|%s\n", last, val["A"], val["B"], val["C"])
        reset()
}
function reset(    i){
        for(i=65;i<=90;i++)
	 val[sprintf("%c",i)]=0
        #val["A"] = val["B"] = val["C"] = 0
}
{       if($1 != last) if(last != "") doprint()
        last = $1
        val[$2] = $3
}
END{    if(last != "") doprint()
}' in

This is assuming that only the alphabets A-Z (single character) will occur in the second field.
Change the loop condition according to your requirements.

that's perfect
just one more thing before closing this thread, can someone explain me how you choose the values 65 and 90

65 is the ASCII code for A and 90 for Z .
So, 65-90 corresponds to A-Z .

sorry didn't make a clear example
2nd column could have alphabetic values [A-Za-z] but also numeric [0-9] and even both alphanumeric,
2nd column have a variable length eg :Ag, Gde, vi3.... etc
however the list of 2nd column's values is limited and knowable before processing the file

Here is one way to do it. This script doesn't require sorted input. The required format for the file specifying the names of the columns in the output matrix is described in the comments:

#!/bin/ksh
# tester -- use awk script to produce matrix of processed input records
#
# Usage tester [ input [ choices ]]
#       This utility will ready a list of entries from the file named by
#       "input" and print those entries as a vertical-bar character ('|') 
#       separated matrix.  The order of rows in the matrix is determined by the
#       first field in the "input" file.  The order of columns in the matrix is 
#       determined by the order of entries in "choices".  Entries in the matrix
#       not found in "input" will be displayed as "0".
#
# Operands:
#       choices Name of file containing a list of expected values for
#               "column-ID" values in the file named by input.  Each line is
#               assumed to a value-ID.  The order of entries in this file
#               determines the order in which columns in the output matrix
#               appear.  If this operand is not present or if is is an empty
#               string, a file named "choices" will be used by default.
#       input   Name of file containing matrix entry input.  Each entry is
#               assumed to be in the form:
#                       row-ID"|"column-ID"|"data
#               The order of entries in this file does not matter, except that
#               "row-ID" values in the output will be in the same order as the
#               first entry for each different "row-ID" value found in this
#               file.  If this operand is not present or if is is an empty
#               string, a file named "in" will be used by default.
#       All whitespace characters in entries in both files are significant.
#
# Exit Codes:
#       0 - successful completion
#       1 - one or more entries in "input" contained a "column-ID" value not
#           found in "choices".
awk 'BEGIN{     FS = SUBSEP = "|"}
FNR==NR{# Get choices for 2nd file 2nd column input from 1st file.
        s2[++s2c] = $1
        cfn = FILENAME
        next
}
{       # Add an entry from the 2nd file...
        val[$1,$2] = $3
        if(!($1 in list)) {
                # This is a new field 1 value.  Store cross reference recording
                # the values and the order in which they first appear in the
                # 2nd file.
                s1[++s1c] = $1
                list[$1] = s1c
        }
}
END{    for(i = 1; i <= s1c; i++) {
                printf("%s", s1)
                for(j = 1; j <= s2c; j++) {
                        # Print the saved values and delete them.
                        printf("%s%s%s%s", FS, s2[j], FS,
                                val[s1,s2[j]] == "" ? 0 : val[s1,s2[j]])
                        delete val[s1,s2[j]]
                }
                printf("\n")
        }
        # Any entries left in val[i,j] at this point come from lines in the 2nd
        # file with a 2nd field that was not listed in the 1st file...
        for(i in val) {
                errors++
                printf("*** Input: \"%s|%s\" has 2nd field not found in %s\n",
                        i, val, cfn)
        }
        exit errors > 0
}' ${2:-choices} ${1:-in}
1 Like

Very Impressive, always thought that 'awk' was powerful but didn't how much it was true.
Tested with the example but also with an other file.
It worked perfectly
Thank's Don Cragun