Hi,
I have two files: a.doc and b.txt
I wish to search the strings from file b.txt in a.doc and want to highlight them in a.doc with different colours using Perl or bash./awk/sed?
Please guide me.
Thanks!!!!!
Hi,
I have two files: a.doc and b.txt
I wish to search the strings from file b.txt in a.doc and want to highlight them in a.doc with different colours using Perl or bash./awk/sed?
Please guide me.
Thanks!!!!!
Try:
perl -lpe 'BEGIN{open b, "b.txt";chomp(@b=<b>)}{for $i (@b) {s/$i/\033[31m$i\033[0m/g}}' a.doc
Thanks
I would highly appreciate if you can explain this script.
I am getting following error while running the above script (1.pl):
I need some help here to help out
Whit this script I get only one single line, correct is 5, why?
awk 'NR==FNR {a[$0]=$0;next} {for (i in a) {if ($0~a) print}}' b.txt a.txt
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
correct answer is
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49
Running a test like this show correctly all possibility
awk 'NR==FNR {a[$0]=$0;next} {for (i in a) {print $0,a}}' b.txt a.txt
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 ACG
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 TAATG
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 AAAAAG
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 GACAAGT
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 CAAGC
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 GCTTG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 ACG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 TAATG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 AAAAAG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 GACAAGT
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 CAAGC
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 GCTTG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 ACG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 TAATG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 AAAAAG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 GACAAGT
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 CAAGC
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 GCTTG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 ACG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 TAATG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 AAAAAG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 GACAAGT
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 CAAGC
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 GCTTG
bioinfo, don't put my code in a file. Simply run it in a terminal as I posted it, replacing a.doc and b.txt for the filenames you have (in case they differ from those two).
@bartus11
Here is what I got when run your code
perl -lpe 'BEGIN{open b, "b.txt";chomp(@b=<b>)}{for $i (@b) {s/$i/\033[31m$i\033[0m/g}}' a.txt
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49
As you see, it show the line with hits, but only highlight one hit, look at my post #4.
OP request is color on all data in bold.
For some strange reason this has same problem, it only highlight one hits.
awk 'NR==FNR {a[$0]=$0;next} {for (i in a) {gsub(a,"\033[1;31m&\033[0m",$0)}}1' b.txt a.txt
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49
Jotne, post output of:
cat -ev a.txt
cat -ev b.txt
s#%#%&E#&%!!!
There was space after code in b.txt
Since I just copied it from post #1, I did not check.
cat -ev b.txt
GACAAGT $
AAAAAG $
TAATG$
CAAGC$
ACG $
GCTTG$
Now both awk
and perl
works fine.
Thanks.
awk 'NR==FNR {a[$0]=$0;next} {for (i in a) {gsub(a,"\033[1;31m&\033[0m",$0)}}1' b.txt a.txt
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49
Thanks bartus and Jotne.
Can you please explain the code.
Thanks.
awk ' NR==FNR {a[$0]=$0;next}
{for (i in a)
{gsub(a,"\033[1;31m&\033[0m",$0)}
}1
' b.txt a.txt
NR==FNR {a[$0]=$0;next}
NR==FNR
This is a technique used to do something on the first file, when more file are listed, in this case b.txt
a[$0]=$0
Store every record of b.txt
in an array named a eks a[GACAAGT]=GACAAGT
for (i in a)
for every element in array a
(b.txt), test it against a.txt
gsub(a,"\033[1;31m&\033[0m",$0)
Test every element in array a
, against the line $0
from a.txt
, if found replace the found text with itself &
plus ansi color code.
Eks if found AGC
, replace it with \033[1;31mAGC\033[0m
= AGC
Then the 1
at the final will print all lines from a.txt
with modified colors for every find.
Edit: the $0
at the final is not needed, since this is the default line to test
Also changing that array name to b
, to reflect its store content of b.txt
to make it more clear.
awk 'NR==FNR {b[$0]=$0;next} {for (i in b) {gsub(b,"\033[1;31m&\033[0m")}}1' b.txt a.txt
Edit2: no need to have b.txt
stored as both value and index of array b
awk 'NR==FNR {b[$0]++;next} {for (i in b) {gsub(i,"\033[1;31m&\033[0m")}}1' b.txt a.txt
Thanks a lot.
I will try them.