Selecting nearest pattern match

I'm looking to match an error code against a list of possible codes and get the nearest match. The code would be a 6 character hexadecimal string.

I have a file of error codes all of which have a specific first 3 characters, however, after that the last 3 characters may be specific or generic as below.

2120xx
2180xy
2182xy
2193xy
2194xy
21A3xy
21A6xx
21A7xx
21Bxyz
21Cxyz
21D0xy
3073xy
3075xy
30A100
3A0xyy
43Bxxx
43Cyxx
453yxx
463yxx
47Dxxx
47E5xx
47E700
47EC00
BF1x1x
BF2x1x

y and x can be any hex character.

I need to be able to match up to the correct code and the only way I have come up with is rather convoluted. It would involve splitting the original error code into 4 parts, 1st 3 chars then char4, char5 and char6. I would then run a while loop that would read in each error code from the file, split it into the same 4 parts and run comparisons as follows,

Do 1st 3 chars match, if so does char4 match or is it x or y, if so does char5 match or is it x or y, if do does char6 match or is it x or y. If all match then that's the line I need.

I haven't worked out all the detail of that solution but I know I can do it that way. I also know there has to be a better way of doing it.

Can anyone help?

Have you tried using character classes in regular expression?

This one returns the longest match.
It first builds a hashed array s[] with all their shorter variants.
Then it tries to match the full given search, then the shorter variants.

awk '{
  for (i=6; i>0; i--) s[substr($0,1,i)]
}
END {
  for (i=6; i>0; i--) if ((x=substr(search,1,i)) in s) {print x; exit}
}' search=BF2XXX file-of-error-codes
BF2
2 Likes

Thank you very much MadeInGermany. That does exactly what I needed and I sort of understand how it works as well. :slight_smile: