I want to compare 4 edge-lists to basically see if an edge is present in all 4 networks. The issue is that an edge A-B in one file can be present as B-A in another file.
Example:
Input 1: net1.txt
A B 0.1
C D 0.65
D E 0.9
E A 0.7
Input 2: net2.txt
A Z 0.1
C D 0.65
E D 0.9
E A 0.7
Input 3: net3.txt
Y Z 0.1
C D 0.65
D E 0.9
W R 0.7
Input 4: net4.txt
F Z 0.1
D C 0.65
D E 0.9
W Q 0.7
Intersection of net1.txt, net2.txt, net3.txt and net4.txt:
Which sequence do you want printed: C D or D C ? How do you determine the one preferred, by the count of occurrences? What if each has a count of 2?
---------- Post updated at 22:45 ---------- Previous update was at 22:28 ----------
Assuming it's the count, try
awk '
FNR==1 {FCNT++}
{T[$1,$2,$3]++
T[$2,$1,$3]++
C[$1,$2]++
}
END {for (t in T) {split (t, X, SUBSEP)
if (T[t]==FCNT && C[X[1],X[2]] >= FCNT/2) print t}}
' SUBSEP=" " file[1-4]
C D 0.65
D E 0.9
This has a small drawback which to find out I leave as a challenge to you.