Pivoting the data

karumudi7 · November 27, 2012, 7:35am

Hello Unix guys,

I have the following 4 column data, which is a output of db2 select query:

Need to pivot the data.

sample Input:
Year Month Country Counts

       2012 Aug    Canada        114
       2012 Aug    USA           92
       2012 Aug    Mexico        3
       2012 Aug    Aus           20
       2012 Sep    India         158
       2012 Sep    Canada        106
       2012 Sep    USA           89
       2012 Sep    Mexico        3
       2012 Sep    Aus           13
       2012 Oct    India         1571953
       2012 Oct    Canada        1387428
       2012 Oct    USA           1330861
       2012 Oct    Mexico        75663
       2012 Oct    Aus           143765
       2012 Oct    Alpha         1
       2012 Nov    India         391
       2012 Nov    Canada        657
       2012 Nov    USA           894
       2012 Nov    Mexico        360
       2012 Nov    Aus           1034

Desired Output:

        2012 Aug      2012 Sep    2012 Oct     2012 Nov
India      178           158        1571953         391
Canada  114            106        1387428         657
USA       92              89        1330861          894
Mexico    3               3          75663            360
Aus        22              13        143765         1034
Alpha      0                0         1                    0

Can you please help in this.

Thanks.

pamu · November 27, 2012, 7:58am

Try sth like this

awk '{if(!X[$1,$2]++){FIR=FIR?FIR"_"$1" "$2:$1" "$2}
{Y[$1,$2,$3]=$4;Z[$3]++}}END{n=split(FIR,FCL,"_");for(i=1;i<=n;i++){s=s?s"\t"FCL:"\t"FCL}
print s;
for(j in Z){s=j;for(i=1;i<=n;i++){split(FCL,PK," ");p=Y[PK[1],PK[2],j]?Y[PK[1],PK[2],j]:0;s=s"\t\t"p};print s}}' file

karumudi7 · November 27, 2012, 8:41am

pamu:

Try sth like this

awk '{if(!X[$1,$2]++){FIR=FIR?FIR"_"$1" "$2:$1" "$2}
{Y[$1,$2,$3]=$4;Z[$3]++}}END{n=split(FIR,FCL,"_");for(i=1;i<=n;i++){s=s?s"\t"FCL:"\t"FCL}
print s;
for(j in Z){s=j;for(i=1;i<=n;i++){split(FCL,PK," ");p=Y[PK[1],PK[2],j]?Y[PK[1],PK[2],j]:0;s=s"\t\t"p};print s}}' file

Excellent it worked.

If possible can you please explain this code.

pamu · November 28, 2012, 12:53am

awk '{if(!X[$1,$2]++){ # Here we remove duplicates of date as $1 and $2. So it will take only single entry for any month.

FIR=FIR?FIR"_"$1" "$2:$1" "$2}  # Here we append date $1 and $2 to variable FIR using FS as _.


{Y[$1,$2,$3]=$4;Z[$3]++}} # Here we Assign $4 to array using index as $1,$2,$3 and add country's to array Z. In array Z only unique countries will be added.

END{n=split(FIR,FCL,"_"); # Here we split FIR var(all unique dates) to variable FCL using FS as _ and n = number dates present in FIR variable.

for(i=1;i<=n;i++){s=s?s"\t"FCL:"\t"FCL} # Here we just append all the unique months to the variable s.

print s;  # print variable s on first line.

for(j in Z){s=j;  # Get values from array Z and assign it to s. array Z contains unique country names

for(i=1;i<=n;i++){  # Use for loop using variable n(which we got from spiting variable FIR)

split(FCL,PK," ") # Split variable FCL(which is has year and month) to array PK.

p=Y[PK[1],PK[2],j]?Y[PK[1],PK[2],j]:0   # Here we check array Y presents or not. PK[1] = Year, PK[2] = Month and j = Country. If for that particular field we have array value then assign it to variable s and if not then assign it to 0

s=s"\t\t"p};  # then append it variable s

print s   # Print variable s

}}' file # Read file

I hope this helps:)

pamu