awk - rearange data

creamcheese · February 17, 2010, 8:20am

Dear All,

once again I am encountering a problem with awk.

The file looks like this:

BroadLeaves 43.6 clc2006 37.6 Conifers 8.3 edge100 5.1 dem30sec 3.9 aspect 1.5 slope 0
dem30sec 58.3 Conifers 28.5 clc2006 7.3 edge100 3 slope 2.4 BroadLeaves 0.4 aspect 0.1
....
...
..

My desired output would be a file with 7 columns for each category one with the corresponding value

aspect BroadLeaves clc2006 Conifers dem30sec edge100 slope
1.5 43.6 37.6 8.3 5.1 3.9 5.1 0 
0.1 0.4 28.5 28.5 58.3 3 2.4
...
..
..

Is there anyway to do this efficiently? The only way I can think of at the moment is a series of if statements.

Any guidance is greatly appreciated, thanks in advance!

radoulov · February 17, 2010, 8:37am

Cuold you please provide more example input data (a bigger sample)?

creamcheese · February 17, 2010, 8:47am

Hi,

thanks for your reply, the sample stays more or less the same, only the varaible change in their position.

BroadLeaves 43.6 clc2006 37.6 Conifers 8.3 edge100 5.1 dem30sec 3.9 aspect 1.5 slope 0
dem30sec 58.3 Conifers 28.5 clc2006 7.3 edge100 3 slope 2.4 BroadLeaves 0.4 aspect 0.1
dem30sec 50.2 Conifers 43.2 clc2006 3 edge100 2.1 slope 0.8 aspect 0.5 BroadLeaves 0.3
dem30sec 47.4 Conifers 42 clc2006 5.2 edge100 2.8 slope 1.3 BroadLeaves 0.7 aspect 0.5
Conifers 44.8 dem30sec 36.8 clc2006 8.6 edge100 7.2 BroadLeaves 1.2 slope 0.9 aspect 0.4
dem30sec 43.5 Conifers 37.8 edge100 12.1 clc2006 2 BroadLeaves 2 aspect 1.6 slope 1.1
Conifers 43.2 dem30sec 40.6 BroadLeaves 7.1 edge100 3.5 clc2006 3 slope 2.1 aspect 0.5
dem30sec 46.6 Conifers 40.8 edge100 5.2 aspect 3.2 clc2006 2.9 BroadLeaves 1 slope 0.3
dem30sec 52.5 Conifers 36.7 edge100 5.5 clc2006 2.2 aspect 1.5 BroadLeaves 1 slope 0.6
dem30sec 49 Conifers 33.8 edge100 5.8 slope 4.5 clc2006 3 BroadLeaves 2.5 aspect 1.3
dem30sec 47.1 Conifers 40.9 clc2006 4.8 slope 4.1 edge100 1.4 BroadLeaves 1.2 aspect 0.5
dem30sec 44.7 Conifers 38.6 clc2006 7 edge100 6.4 slope 1.6 BroadLeaves 1.4 aspect 0.2
Conifers 28.2 dem30sec 16.7 edge100 15.9 clc2006 14.2 BroadLeaves 12.6 aspect 6.7 slope 5.8
dem30sec 46.5 Conifers 42.6 clc2006 4.8 edge100 4.4 BroadLeaves 0.8 aspect 0.5 slope 0.4
dem30sec 53.8 Conifers 35 edge100 4.1 clc2006 4 aspect 1.9 slope 0.8 BroadLeaves 0.4
Conifers 40.7 dem30sec 39.6 edge100 7.6 clc2006 6.4 BroadLeaves 2.5 aspect 2 slope 1.3
dem30sec 44.7 Conifers 43 clc2006 5.2 edge100 3.4 slope 1.6 aspect 1.3 BroadLeaves 0.8
Conifers 51 dem30sec 36.8 edge100 5.1 clc2006 3.5 BroadLeaves 1.4 aspect 1.1 slope 0.9
dem30sec 47.3 Conifers 42.8 edge100 5.3 clc2006 2 slope 1.8 BroadLeaves 0.4 aspect 0.4
dem30sec 45.8 Conifers 37.3 edge100 7.2 clc2006 6.1 aspect 1.6 slope 1.3 BroadLeaves 0.6
dem30sec 52.4 Conifers 37.7 clc2006 4.5 edge100 3.1 BroadLeaves 1.1 slope 0.8 aspect 0.5
dem30sec 43.8 Conifers 42.5 edge100 7.6 clc2006 3.6 slope 1.7 BroadLeaves 0.5 aspect 0.4
dem30sec 43.8 Conifers 43.7 clc2006 5.9 edge100 4.3 aspect 1.2 BroadLeaves 0.6 slope 0.6
Conifers 49.2 dem30sec 27.6 edge100 6.9 aspect 6.9 BroadLeaves 4 clc2006 3.5 slope 1.9
dem30sec 56.5 Conifers 35.6 edge100 3.9 clc2006 2.3 BroadLeaves 0.9 slope 0.5 aspect 0.2
dem30sec 51.4 Conifers 37.9 edge100 5.1 clc2006 3.6 BroadLeaves 1.1 slope 0.8 aspect 0.1

thanks

sharadpisal · February 17, 2010, 9:26am

well not sure about awk, but it would be very easy with perl.
are you interested in knowing the soln?

creamcheese · February 17, 2010, 9:30am

That would be great. I havent worked with Perl so fare but I am familiar with other programming languages.

Thanks J

sharadpisal · February 17, 2010, 9:51am

Create a file with some name, say converter.pl with content like,

#!/usr/bin/perl -w
use strict;

my @keyList = qw (edge100 BroadLeaves Conifers clc2006 slope aspect dem30sec);

print join (",", @keyList) . "\n";
while (my $line = <>) {
        my %arr = split (/\s/, $line);
        my @vals;
        foreach my $key (@keyList) {
                push (@vals, $arr{$key});
        }
        print join (",", @vals) . "\n";
}

run it like

cat  data| ./converter.pl

creamcheese · February 17, 2010, 10:19am

Great, thanks a lot it works perfectly and does exactly what I was looking for!

radoulov · February 17, 2010, 10:30am

An attempt to handle a variable number of unknown keys:

perl -lne'
push @{ $_{$1} }, $2 while /(\S+)\s+(\S+)/g;

END {
    $max = ( sort { $#$b <=> $#$a } values %_ )[0];
    @keys = sort { lc $a cmp lc $b } keys %_;
    $mask = join "\t", map "%" . ( length $_ ) . "s", @keys;
    printf $mask. "\n", @keys;
    for ( $i = 0 ; $i <= @$max ; $i++ ) {
        printf $mask. "\n", map $_{$_}->[$i], @keys;
    }

}' infile

ahmad.diab · February 17, 2010, 1:47pm

Just a try to use a more simpler Perl code to achieve the target.

perl -wnale '
BEGIN{
$,="," ;
@a=(edge100,BroadLeaves,Conifers,clc2006,slope,aspect,dem30sec) ;
print sort(@a)
}
%h=@F ;
for ( sort keys %h) { printf "$h{$_}," ; }
print "" ;
' infile.txt

---------- Post updated at 20:47 ---------- Previous update was at 19:56 ----------

another better improvement can be added to the code as below:-

perl -wnale '
%B=@F , $,=","  , print sort {"\L$a" cmp "\L$b"} keys %B if $.==1 ;
%h=@F ; for ( sort { "\L$a" cmp "\L$b" } keys %h) { printf "$h{$_}," ; }  print "" ;
' infile.txt

radoulov · February 17, 2010, 2:45pm

Yes,
or:

perl -lane'
%_=@F;@f=sort{lc$a cmp lc$b}keys%_;print join"\t",@f if$.==1;print join"\t",map$_{$_},@f;
 ' infile