That means I want to subset from the 2nd file which IDs are unique and copy all the cols of 2nd file and add $2 (count) from file 1
To do this first I did:
smitra:File_editing smitra$ awk 'NR == FNR {
> k[$1]
> next
> }
> ($1) in k
> ' test_GI_count1.txt Protein_gi_GeneID_symbol_1.txt > merged_file.txt
And added $2 of 1st file separately later.
This workes perfectly unless I have some problem with new file, where first file IDs ($1) is often protein_gi, but sometimes GeneID.
e.g. in 1st file
1000094 2
where as in 2nd file
77747945 1000094 treA
That means I need to search them from $1 OR $2 of 2nd file.
Can anybody please suggest me how can I do that?
Thanks a lot,
Mitra
---------- Post updated at 09:09 AM ---------- Previous update was at 09:05 AM ----------
And also I want to add $2 (count) from file 1, together in same script, so that I will not have mismatch in col length when some IDs are absent from 2nd file.
Any suggestion will be really great.
Thanks a lot,
Mitra
There we go. It makes it much easier to help when relevant examples are given
So I made an example of your given input and arranged some matched in file1 and file2.
Dear zaxxon,
Thank you very much. But it still producec the similar output which I already got.
But I have the file bit more complicated. I am editing your example. Thanks for creating the example.
Is that second column match always there? Join only takes one, so you need to preprocess them into one field, or postprocess out the second field mismatches. For that much trouble, you might want to store one in an associative array and then match it with the other, which can be done in bash or awk.