EvaAM
April 19, 2013, 7:41am
1
Hello all,
I am quite new in this but I need some help to keep going with my analysis.
I am struggling with a short script to read a square matrix and convert it in two collumns.
A B C D
A 0.00 0.06 0.51 0.03
B 0.06 0.00 0.72 0.48
C 0.51 0.72 0.00 0.01
D 0.03 0.48 0.01 0.00
This matrix is an example of the genetic distances of the same gene in different species.
Then, I need two collums where I can easily access any row and view the genetic distances existing between the different pairs (couples):
AA 0.00
AB 0.06
AC 0.51
AD 0.03
BB 0.00
BC 0.72
BD 0.48
CC 0.00
CD 0.01
DD 0.00
Looks easy, but I didn�t get it.
Thanks a lot,
EvaAM
Try this:
cat matrix.txt
A B C D
A 0.00 0.06 0.51 0.03
B 0.06 0.00 0.72 0.48
C 0.51 0.72 0.00 0.01
D 0.03 0.48 0.01 0.00
cat matrix.txt| awk -F" " '{if(NR==1){c1=$1;c2=$2;c3=$3;c4=$4;}else{
if (c1<=$1) printf "%s%s %s\n",c1,$1,$2;
if (c2<=$1) printf "%s%s %s\n",c2,$1,$3;
if (c3<=$1) printf "%s%s %s\n",c3,$1,$4;
if (c4<=$1) printf "%s%s %s\n",c4,$1,$5;}}'
Yoda
April 19, 2013, 9:27am
3
awk '
NR == 1 {
split ( $0, H )
}
NR > 1 {
for ( i = 2; i <= NF; i++ )
print $1 H[i-1] OFS $i
}
' matrix.txt
#! /usr/bin/perl -w
use strict;
my @arr = qw / A B C D /;
my ($fields, $i, $j) = ([], 0, 0);
open FH, "< file";
while (<FH>) {
chomp;
$fields->[$i] = [split /\s+/];
$i++;
}
close FH;
for ($i = 0; $i <= 3; $i++) {
for ($j = $i; $j <= 3; $j++) {
print "$arr[$i]$arr[$j] $fields->[$i][$j]\n";
}
}
[user@host ~]# cat file
0.00 0.06 0.51 0.03
0.06 0.00 0.72 0.48
0.51 0.72 0.00 0.01
0.03 0.48 0.01 0.00
[user@host ~]#
[user@host ~]# ./test.pl
AA 0.00
AB 0.06
AC 0.51
AD 0.03
BB 0.00
BC 0.72
BD 0.48
CC 0.00
CD 0.01
DD 0.00
[user@host ~]#
Yoda
April 19, 2013, 4:00pm
5
Misread the requirement! I guess a slight modification is required to produce the output that OP wants:
awk '
BEGIN {
c = 2
}
NR == 1 {
split ( $0, H )
}
NR > 1 {
for ( i = c; i <= NF; i++ )
{
print $1 H[i-1] OFS $i
}
++c
}
' matrix.txt
Try:
awk 'NR==1{split($0,C); next} {for(i=NR; i<=NF; i++) print $1 C[i-1], $i}' file
2 Likes
Yoda
April 19, 2013, 4:20pm
7
Nice!
Variable NR
never crossed my mind!!
EvaAM
April 22, 2013, 4:18am
8
Thanks!
this worked perfectly
awk 'NR==1{split($0,C); next} {for(i=NR; i<=NF; i++) print $1 C[i-1], $i}' file