Look for substrings with special characters

Hello gurus,

I have a lookup table

cat  tmp1
[//rtwttwtr*fgg]\\\erw``~ 1
^774574574565665f[[[//]\] 2
()42543^[[D^[[D^[[D^[[D^[[D353535345****@3242- 3

and I`m trying to compare a bunch of strings such that, either the lookup table column 1, or the string to be looked up are substrings of each other (and return the second lookup column if yes).

cat  tmp2
[//rtwtt
[//rtwttwtr*fgg]\\\erw``~
[//rtwttwtr*fgg]\\\erw``~4353535^^^7
()42543^[[D^[[D^[[D^[[D^[[D353535345****@3242--
rwerq5555525525

My desired output is

[//rtwtt 1
[//rtwttwtr*fgg]\\\erw``~ 1 
[//rtwttwtr*fgg]\\\erw``~4353535^^^7 1
()42543^[[D^[[D^[[D^[[D^[[D353535345****@3242-- 3
rwerq5555525525

Here is what I tried

awk 'NR==FNR{a[$1]=$2;next} { for(as in a) { if(($1~as) || (as~$1)) print $1,a[as]; continue}}' tmp1 tmp2

Also

awk 'NR==FNR{a[$1]=$2;next} { for(as in a) { if(($1~/as/) || (as~/$1/)) print $1,a[as]; continue}}' tmp1 tmp2

How can I tell the code to ignore the special characters and just compare the strings.
note: Either of the strings must fully contain the other string to satisfy the lookup.

What are the special chars? And why is rwerq5555525525 in your desired output?

1 Like

The special characters come from a pedigree code in plant breeding which has all sorts of characters like [, ], *,- ,@ , /, \,(,) space embedded in alphanumeric strings. Do you need a super-set of all special characters?

rwerq5555525525 is included in the output without a lookup value since it, or any substring of it is not present in the lookup table.