This will fill in the missing zeros. This doesn't require the input to be sorted, but it does require all entries in the input with the same value for the first field to be adjacent:
awk -F "|" 'BEGIN{reset()}
function doprint(){
printf("%s|A|%s|B|%s|C|%s\n", last, val["A"], val["B"], val["C"])
reset()
}
function reset(){
val["A"] = val["B"] = val["C"] = 0
}
{ if($1 != last) if(last != "") doprint()
last = $1
val[$2] = $3
}
END{ if(last != "") doprint()
}' in
awk -F \| '
BEGIN { lastc = "C"; } # change if there are more than three rows/type
function print_row( c, cv )
{
cv = 65;
c = "A";
printf( "%s|", last );
while( c <= lastc )
{
if( last " " c in data )
printf( "|%s|%s", c, data[last " " c] );
else
printf( "|%s|0", c );
cv++;
c = sprintf( "%c", cv );
}
printf( "\n" );
}
{
if( $1 != last )
{
print_row();
delete( data );
}
data[$1" "$2] = $3;
last = $1
}
END { print_row(); }
' input >output
It assumes that the second field is a single character in the range of A-Z.
This is assuming that only the alphabets A-Z (single character) will occur in the second field.
Change the loop condition according to your requirements.
sorry didn't make a clear example
2nd column could have alphabetic values [A-Za-z] but also numeric [0-9] and even both alphanumeric,
2nd column have a variable length eg :Ag, Gde, vi3.... etc
however the list of 2nd column's values is limited and knowable before processing the file
Here is one way to do it. This script doesn't require sorted input. The required format for the file specifying the names of the columns in the output matrix is described in the comments:
#!/bin/ksh
# tester -- use awk script to produce matrix of processed input records
#
# Usage tester [ input [ choices ]]
# This utility will ready a list of entries from the file named by
# "input" and print those entries as a vertical-bar character ('|')
# separated matrix. The order of rows in the matrix is determined by the
# first field in the "input" file. The order of columns in the matrix is
# determined by the order of entries in "choices". Entries in the matrix
# not found in "input" will be displayed as "0".
#
# Operands:
# choices Name of file containing a list of expected values for
# "column-ID" values in the file named by input. Each line is
# assumed to a value-ID. The order of entries in this file
# determines the order in which columns in the output matrix
# appear. If this operand is not present or if is is an empty
# string, a file named "choices" will be used by default.
# input Name of file containing matrix entry input. Each entry is
# assumed to be in the form:
# row-ID"|"column-ID"|"data
# The order of entries in this file does not matter, except that
# "row-ID" values in the output will be in the same order as the
# first entry for each different "row-ID" value found in this
# file. If this operand is not present or if is is an empty
# string, a file named "in" will be used by default.
# All whitespace characters in entries in both files are significant.
#
# Exit Codes:
# 0 - successful completion
# 1 - one or more entries in "input" contained a "column-ID" value not
# found in "choices".
awk 'BEGIN{ FS = SUBSEP = "|"}
FNR==NR{# Get choices for 2nd file 2nd column input from 1st file.
s2[++s2c] = $1
cfn = FILENAME
next
}
{ # Add an entry from the 2nd file...
val[$1,$2] = $3
if(!($1 in list)) {
# This is a new field 1 value. Store cross reference recording
# the values and the order in which they first appear in the
# 2nd file.
s1[++s1c] = $1
list[$1] = s1c
}
}
END{ for(i = 1; i <= s1c; i++) {
printf("%s", s1)
for(j = 1; j <= s2c; j++) {
# Print the saved values and delete them.
printf("%s%s%s%s", FS, s2[j], FS,
val[s1,s2[j]] == "" ? 0 : val[s1,s2[j]])
delete val[s1,s2[j]]
}
printf("\n")
}
# Any entries left in val[i,j] at this point come from lines in the 2nd
# file with a 2nd field that was not listed in the 1st file...
for(i in val) {
errors++
printf("*** Input: \"%s|%s\" has 2nd field not found in %s\n",
i, val, cfn)
}
exit errors > 0
}' ${2:-choices} ${1:-in}
Very Impressive, always thought that 'awk' was powerful but didn't how much it was true.
Tested with the example but also with an other file.
It worked perfectly
Thank's Don Cragun