Hello All Unix Users,
I am still new to Unix, however I am eager to learn it..
I have 2 files, some lines have some matching substrings, I would like to concatenate these lines into one lines, leaving other untouched. Here below is an example for that..
File 1 (fasta file):
>292183
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGGCGCACGGGTGAGT
>551166
GTCGAGCGGCGAACGGGTGAGTAACGCGTGGATTATCTGCCCCGAGGTGGGGGATAACCCGGGGAAACTCGGGCTAATACCGCATATGACCGTGAGGTCAAAGGGGGGTCGCA
File 2:
292183 k__Bacteria
551166 k__Bacteria; p__Acidobacteria
The desired output:
>292183 k__Bacteria
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGGCGCACGGGTGAGT
>551166 k__Bacteria; p__Acidobacteria
GTCGAGCGGCGAACGGGTGAGTAACGCGTGGATTATCTGCCCCGAGGTGGGGGATAACCCGGGGAAACTCGGGCTAATACCGCATATGACCGTGAGGTCAAAGGGGGGTCGCA
I tried to use awk and perl for that, but I never had them into one file..
I appreciate any help,
Best Regards,
Mohamed
awk 'NR==FNR {a[">"$1] = ">"$0; next} a[$1] {$0 = a[$i]}'1 file2 file1
1 Like
awk 'FILENAME=="file2" {arr[$1]=$0}'
FILENAME=="file1" { if (index($0, ">")==1)
{print ">" arr[substr($0,2)]; next}
{print $0}' file2 file1 > newfile
Note: file2 file1 in that order are required -this code is just a less compact form of balajesuri's post.
1 Like
Thanks, but I would like the connector to be space not tab delimited..
>292183<\s>k__Bacteria
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGGCGCACGGGTGAGT
---------- Post updated at 06:31 AM ---------- Previous update was at 06:29 AM ----------
awk 'FILENAME=="file2" {arr[$1]=$0}'
FILENAME=="file1" { if (index($0, ">")==1)
{print ">" arr[substr($0,2)]; next}
{print $0}' file2 file1 > newfile
Note: file2 file1 in that order are required -this code is just a less compact form of balajesuri's post.
Thanks, it worked very well with me except I would like to have space not tabes..
>292183<\s>k__Bacteria
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGACAGGCTTAACACATGCAAGTCGAGGGGCAGCGGGGAGGAAGCTTGCTTTCTCTGCCGGCGACCGG CGCACGGGTGAGT
pamu
June 7, 2013, 8:06am
5
try
awk 'NR==FNR{for(i=1;i<=NF;i++){S=S?S" "$i:">"$i}; A[">"$1]=S; next}{print A[$0]?A[$0]:$0}' file_2 file_1
You probably have tabs in your original pasta file, they get "carried over" to the new file by default.
add this line at the top of the awk code block
BEGIN{OFS=" "}
I don't think OFS is relevant since none of the solutions rebuild $0 or use a print statement with multiple args.
Regards,
Alister