I have a file (file-1) that looks like this,
DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10168N|refseq:NP_418766|uniprotkb:P15005
DIP-10199N|refseq:NP_415632|uniprotkb:P30958
DIP-10358N|refseq:NP_418659|uniprotkb:P28903
DIP-10440N|refseq:NP_289596|uniprotkb:P20082
DIP-10441N|refseq:NP_417502|uniprotkb:P20083
DIP-10441N|refseq:NP_417502|uniprotkb:P20083
DIP-10467N|refseq:NP_415423|uniprotkb:P09373
DIP-10469N|refseq:NP_418386|uniprotkb:P32674
DIP-10562N|refseq:NP_418370|uniprotkb:P17888
DIP-10582N|refseq:NP_414864|uniprotkb:P77743
DIP-10592N|refseq:NP_415819|uniprotkb:P37344
and another (file-2) which looks like this,
DIP-10331N|refseq:NP_311078|uniprotkb:P12638 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10331N|refseq:NP_311078|uniprotkb:P12638 DIP-10840N|refseq:NP_414640|uniprotkb:P10408
DIP-1025N|refseq:NP_414574|uniprotkb:P00968 DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10467N|refseq:NP_415423|uniprotkb:P09373 DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10757N|refseq:NP_288150|uniprotkb:P02421
In output I want to print the contents of file-1 plus the value in either column of file-2 that has the same value as that of file-1 in the other column. Like this,
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-1025N|refseq:NP_414574|uniprotkb:P00968
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-10467N|refseq:NP_415423|uniprotkb:P09373
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10757N|refseq:NP_288150|uniprotkb:P02421
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10331N|refseq:NP_311078|uniprotkb:P12638
DIP-10467N|refseq:NP_415423|uniprotkb:P09373 DIP-10097N|refseq:NP_416170|uniprotkb:P30015
Any help would be highly appreciated.
Hello Syeda,
Not clear to me like do you want to compare both the fields of the files, could you please try following and let me know if this helps you.
awk 'FNR==NR{A[$1];next} ($1 in A){print $0} ($2 in A){print $0}' file1 file2
Output will be as follows.
DIP-10331N|refseq:NP_311078|uniprotkb:P12638 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-1025N|refseq:NP_414574|uniprotkb:P00968 DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10467N|refseq:NP_415423|uniprotkb:P09373 DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10467N|refseq:NP_415423|uniprotkb:P09373 DIP-10097N|refseq:NP_416170|uniprotkb:P30015
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10757N|refseq:NP_288150|uniprotkb:P02421
Here I am comparing $1
of file1 to both the fields of file2.
Let us know how It goes for you.
Thanks,
R. Singh
No, not really, this isn't exactly what I wanted.
There was a little mistake in the output I had shown, sorry! I've corrected it now.
I do want to compare $1 of file-1 with both columns of file-2, but in the output I want to print $1 first and then the value (same or different) that is next to it in file-2.
RudiC
November 2, 2015, 1:12pm
4
Try
awk '
FNR==NR {C[$1]++
R[$1,C[$1]]=$2
C[$2]++
R[$2,C[$2]]=$1
next
}
$1 in C {for (i=1; i<=C[$1]; i++) print $1, R[$1,i]
}
' file2 file1
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-1025N|refseq:NP_414574|uniprotkb:P00968
DIP-10097N|refseq:NP_416170|uniprotkb:P30015 DIP-10467N|refseq:NP_415423|uniprotkb:P09373
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10331N|refseq:NP_311078|uniprotkb:P12638
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10117N|refseq:NP_414973|uniprotkb:P08177
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10750N|refseq:NP_289799|uniprotkb:P02410
DIP-10117N|refseq:NP_414973|uniprotkb:P08177 DIP-10757N|refseq:NP_288150|uniprotkb:P02421
DIP-10467N|refseq:NP_415423|uniprotkb:P09373 DIP-10097N|refseq:NP_416170|uniprotkb:P30015
1 Like
Or Ravinder's solution slightly changed
awk 'FNR==NR{A[$1];next} ($1 in A){print $1,$2} ($2 in A){print $2,$1}' file1 file2
The order of the output is the one of file2.
1 Like