Dear All,
I have to reduce the redundancy of a file that is like this:
a b 0
a c 0
a f 1
b a 1
b a 0
b c 1
d f 0
g h 1
f d 1
Basically, this file describe a network with relative nodes and edges.
The nodes are the different letters and the edges are represented by the numbers (in particluar 0, means that the direction of edges is from left to right, 1 is viceversa).
As you may notice, some interaction are duplicates (in bold). For example interaction:
a b 0
b a 1
a-->b
b<--a
Are exactly the same. The first line interaction go from a to b (0 means inreaction go from left to right), in second line interaction still go from a to b (1 means interaction go from right to left).
What I would like is to filter the file above and output a file like this:
a b 0
a c 0
a f 1
b a 0
b c 1
d f 0
g h 1
So, all the duplicated interaction are removed.
!Interactions
a b 0
b a 0
are not the same! Both go from left to right but is different the starting node.
a-->b
b-->a
---------- Post updated at 15:08 ---------- Previous update was at 15:00 ----------
Howsoever, try
awk '
($2,$1) in B &&
B[$2,$1] != $3 {next
}
!(($1,$2) in B) {B[$1,$2] = $3
}
END {for (b in B) {split (b, C, SUBSEP)
print C[1], C[2], B
}
}
' file
a b 0
a c 0
a f 1
b a 0
b c 1
d f 0
g h 1
The order of the output lines cannot be guaranteed.
Wouldn't it much easier (from the viewpoint of understandibility) to first transform the file into a format, where the third column is always zero, i.e. if you have a line
X Y 1
you would replace it by
Y X 0
After this, you could simply use
sort -u
to remove duplicates.
By the way, if you ensure that the third column is always 0, it becomes redundant and you could remove it completely, making the file format even simpler.