Hi All,
I want to lookup name for an id in col2 input from another file and add the name to each line.
Input 1
comp100001_c0_seq1 At1g31340 30.40 569 384 11 3 1673 313 834 7e-62 237
comp100003_c0_seq1 At1g35370_2 35.00 80 50 2 597 364 678 753 1e-09 42.7
Input 2
[R] KOG0017 FOG: Transposon-encoded proteins with TYA, reverse transcriptase, integrase domains in various combinations
ath: At1g10260
ath: At1g11265
ath: At1g35050
ath: At1g35370_2
ath: At1g35647
[OR] KOG0001 Ubiquitin and ubiquitin-like proteins
ath: At1g31340
ath: At1g53930
ath: At1g53950
ath: At1g53980
ath: At1g64470
Expected output
comp100001_c0_seq1 At1g31340 30.40 569 384 11 3 1673 313 834 7e-62 237 [OR] KOG0001 Ubiquitin and ubiquitin-like proteins
comp100003_c0_seq1 At1g35370_2 35.00 80 50 2 597 364 678 753 1e-09 42.7 [R] KOG0017 FOG: Transposon-encoded proteins with TYA, reverse transcriptase, integrase domains in various combinations
Hello,
Following may help.
awk 'NR==FNR{a[$2];next} ($2 in a) {print $0}' file2 file1
Output will be as follows.
comp100001_c0_seq1 At1g31340 30.40 569 384 11 3 1673 313 834 7e-62 237
comp100003_c0_seq1 At1g35370_2 35.00 80 50 2 597 364 678 753 1e-09 42.7
EDIT: What is the logic to get the last column data in your expected Output. Sorry I have noticed just now the last column.
Thanks,
R. Singh
The last column is the name corresponding with col2 of input 1 which is at the last header at the top starting with [some alphabets] some description.
So for At1g31340 it is [OR] Ubiquitin and ubiquitin-like proteins and for At1g35370_2 it is [R] KOG0017 FOG: Transposon-encoded proteins with TYA, reverse transcriptase, integrase domains in various combinations.
Please note that names for some ids may not be found. they should be left as it is, that is no name is added to the last column.
Yoda
January 28, 2014, 2:38pm
4
Here is an awk program based on some assumptions:
awk '
NR == FNR {
if ( $0 ~ /\[[A-Z]*\]/ )
D = $0
else
A[$NF] = D
next
}
$2 in A {
$0 = $0 FS A[$2]
}
1
' input2 input1
1 Like
Thanks a lot Yoda, you are too good boss. Could you please exaplain the code please.
Thanks,
R. Singh
thank you, this looks perfect,
Thank you, this looks perfect
Yoda
January 28, 2014, 2:49pm
8
The code reads input2 initially and searches for pattern /\[[A-Z]*\]/
which is the header / description as per OP. The value is stored in variable D
For non-header records, the value D
is assigned to associate array A
indexed by last column.
The code then reads input1 and append the description if key is present in array and print or else print the record as it is.
1 Like