hello,
i would need a fast awk script for conversion of network formats (from 'sif' to 'adjacency' format):
sif (pp means only: protein-protein interaction):
A pp B
A pp C
B pp D
D pp E
in an adjacency n x n matrix:
A B C D E
A 0 1 1 0 0
B 1 0 0 1 0
C 1 0 0 0 0
D 0 1 0 0 1
E 0 0 0 1 0
my idea:
go through all rows and build two indexed arrays (if array-names taken from the input file, i.e. $1, are allowed - i think this is called name substitution):
names[$1]=dummy
names[$3]=dummy
$1[$3] = 1
$3[$1] = 1
then loop over all array-names for (i in names)
to write the column headers.
then loop nested two times over all array-names for (j in names); for (k in names)
and write "1" if j[k] is 1 else "0"
. (I hope indices are always sorted the same way).
do you think this could work? and perhaps you can provide some code drafts (I am rather untrained in awk).
if substitution for array names doesn't work, perhaps 'two dimensional' arrays would work?
names($1)=dummy
names($3)=dummy
pp($1,$3)=1
pp($3,$1)=1
the rest as above, loop two times over name indices and check if pp(j,k) is 1.
thank you very much...
dietmar