Hello,
I would like to find an efficient way to compare a pair of strings that differ at one position, and return the difference and position.
For example:
String1 123456789
String2 123454789
returning something - position 6, 6/4
Thanks in advance,
Mike
joeyg
July 15, 2008, 10:54am
2
I think it has an option to process byte by byte; seems to be what you are looking for.
or awk
echo 12345678 12345478 | \
awk ' BEGIN {pos=0}
{
max=(length($1) >= length($2))? length($1): length($2)
for(i=1; pos == 0 && i <= max; i++)
{
v1=substr($1, i, 1)
v2=substr($2, i, 1)
if(v1 != v2){ pos=i }
}
}
END { if(pos) {printf("%d %d/%d\n", pos, v1, v2) }}'
An awk solution is great! Thanks Jim.
I've also just found cmp in the GNU DiffUtilities package, but yours is pretty much what I was looking for.
Oh bother!
It turns out that I didn't fully explain what I was trying to do. Jim's solution works for a single pair of strings that I wish to compare, however I actually have a file with pairs of strings on each line. I would like to carry out the comparison on each line in turn. Jim's awk script just checks the first line.
Sorry if I am being dumb about this.
Mike
era
July 21, 2008, 8:45am
6
Here's a minor adaptation of jim's script. It prints the line number and the offset, or nothing if both tokens are identical.
awk '{ pos=0
max=(length($1) >= length($2))? length($1): length($2)
for(i=1; pos == 0 && i <= max; i++)
{
v1=substr($1, i, 1)
v2=substr($2, i, 1)
if(v1 != v2) printf "%i: %d %d/%d\n", NR, pos, v1, v2
}
}' filename