Concatenating 2 lines from 2 files having matching strings

Hello All Unix Users,

I am still new to Unix, however I am eager to learn it..
I have 2 files, some lines have some matching substrings, I would like to concatenate these lines into one lines, leaving other untouched. Here below is an example for that..

File 1 (fasta file):

>292183
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGGCGCACGGGTGAGT
>551166
GTCGAGCGGCGAACGGGTGAGTAACGCGTGGATTATCTGCCCCGAGGTGGGGGATAACCCGGGGAAACTCGGGCTAATACCGCATATGACCGTGAGGTCAAAGGGGGGTCGCA

File 2:

292183	k__Bacteria
551166	k__Bacteria; p__Acidobacteria

The desired output:

>292183 k__Bacteria
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGGCGCACGGGTGAGT
>551166 k__Bacteria; p__Acidobacteria
GTCGAGCGGCGAACGGGTGAGTAACGCGTGGATTATCTGCCCCGAGGTGGGGGATAACCCGGGGAAACTCGGGCTAATACCGCATATGACCGTGAGGTCAAAGGGGGGTCGCA

I tried to use awk and perl for that, but I never had them into one file..

I appreciate any help,
Best Regards,
Mohamed

awk 'NR==FNR {a[">"$1] = ">"$0; next} a[$1] {$0 = a[$i]}'1 file2 file1
1 Like
awk 'FILENAME=="file2" {arr[$1]=$0}'
       FILENAME=="file1" { if (index($0, ">")==1) 
                                      {print ">" arr[substr($0,2)]; next}
                                   {print $0}' file2 file1 > newfile

Note: file2 file1 in that order are required -this code is just a less compact form of balajesuri's post.

1 Like

Thanks, but I would like the connector to be space not tab delimited..

>292183<\s>k__Bacteria
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGGCGCACGGGTGAGT

:wink:

---------- Post updated at 06:31 AM ---------- Previous update was at 06:29 AM ----------

Thanks, it worked very well with me except I would like to have space not tabes..

>292183<\s>k__Bacteria
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGG CGCACGGGTGAGT

try

awk 'NR==FNR{for(i=1;i<=NF;i++){S=S?S" "$i:">"$i}; A[">"$1]=S; next}{print A[$0]?A[$0]:$0}' file_2 file_1

You probably have tabs in your original pasta file, they get "carried over" to the new file by default.
add this line at the top of the awk code block

BEGIN{OFS=" "}

I don't think OFS is relevant since none of the solutions rebuild $0 or use a print statement with multiple args.

Regards,
Alister