Pair wise comparisons

Diya123 · June 10, 2013, 2:09pm

Hi,

I have 25 groups and I need to perform all possible pairwise compariosns between them using the formula n(n-1)/2. SO in my case it will be 25(25-1)/2 which is equal to 300 comparisons.

my 25 groups are

FG1	FG2	FG3	FG4	FG5
NT5E	CD44	CD44	CD44	AXL
ADAM19	CCDC80	L1CAM	L1CAM	CD44
AXL	COL1A1	ADM	RND3	FOSL1
CD44	COL3A1	COL1A1	BMP1	SP100
CD68	COL6A1	COL3A1	COL3A1	ACSL1
FXYD5	COL6A2	COL18A1	COL6A1	ADM
GLIPR1	COL6A3	CTGF	COL6A2	A2M
GM2A	COL18A1	EPAS1	COL6A3	COL1A1
L1CAM	CTGF	FOXC1	COL18A1	COL3A1
ADM	FBN1	GAP43	CTGF	COL6A2
A2M	FOXC1	HMOX1	ITGA4	DUSP4
ALPK2	LOX	HBEGF	ICAM1	FHL2
ANGPTL2	MGP	ITGA4	IL32	GAL
BMP1	MMP2	ICAM1	LOXL2	HMOX1
CALCRL	POSTN	IL1B	MGP	IL1B
CTSL1	PDGFRA	IL6R	POSTN	IL6R
CXCL14	SPARC	IL8	PCDH12	LOX
C18orf1	SPOCK1	LOX	SORBS3	MGP
CCDC80	THSD4	MMP2	SPOCK1	PGF
COL1A1	TFPI2	PGF	TGFB1I1	PDGFRA
COL3A1	TGFB2	PLAU	TGFBI	PTGS2
COL6A1	TGFBI	PTGS2	TNFRSF12A	RCAN1
COL6A2	VCAN	SPOCK1	VCAN	TGFB2
COL6A3		TGFB2		
COL18A1		TNFRSF12A		
CSF1		UNC5B		
CTGF		UNC5C		
CYBRD1		ETS1		
DKK1		VCAN		
DKK3				
EMP1				
FBN1

So for the first comparison which is comparing 1st group to 2nd group( group meaning column) then the result is going to be 10/23 ( here 10 is the number of genes common between 2 functional groups and 23 is the minimum list based on the two columns that we are comparing. In my example 1st column has 32 entries and 2nd column has 23 entries so the minimum of these two is 23. So the calculation is 10/23. I need to do this for all 300 comparisons). I tried doing this by R, but I get duplicates meaning (FG1 vs FG2 and FG2vs FG1). So I am seeing if there is a simple way to do this in awk.

Thanks,

pravin27 · June 11, 2013, 2:59am

Could this help you ?

 
#!/bin/ksh
TotFGNumber=25
for (( i=1; i<=$TotFGNumber; i++ ))
do
        awk -vFGNo=$i -F"\t" '$FGNo !~ /^$/ {print $FGNo}' FGFilename > "FGFile_"$i
done
for (( i=1; i<=$TotFGNumber; i++ ))
do
        file1="FGFile_"$i
        k=$(($i + 1))
        for (( j=k; j<=$TotFGNumber; j++ ))
        do
                file2="FGFile_"$j
                LineCnt1=$(wc -l < $file1)
                LineCnt2=$(wc -l < $file2)
                [ $LineCnt1 -lt $LineCnt2 ] && LineCount=$(( $LineCnt1 - 1 )) ||  LineCount=$(( $LineCnt2 - 1))
                CommonFieldNo=$(awk     'NR==FNR{a[$1]++;next} \
                        a[$1]' $file1 $file2 | wc -l)
                Comparision=$(echo "scale=5;$CommonFieldNo / $LineCount" | bc)
                echo "FG${i} vs FG${j} -  $Comparision"
        done
done
rm -f FGFile_*