Hi.
This solution relies on components docdiff and a short perl script:
#!/usr/bin/env bash
# @(#) s2 Demonstrate differences at character level.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C perl docdiff
f1=data1
f2=data2
FILES="$f1 $f2"
pl " Input files $FILES"
head $FILES
pl " perl extraction helper script:"
cat p1
pl " Results, wdiff format, $f1, $f2:"
docdiff --wdiff --char $f1 $f2
pl " Results, wdiff format, $f1, $f2, extracted diff with labels:"
docdiff --wdiff --char $f1 $f2 |
./p1 $f1 $f2
pl " Results, wdiff format, $f2, $f1, extracted diff with labels:"
docdiff --wdiff --char $f2 $f1 |
./p1 $f2 $f1
exit 0
producing:
% ./s2
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
perl 5.10.0
docdiff 0.3.4
-----
Input files data1 data2
==> data1 <==
orange
123456789xa
X-klystron
==> data2 <==
orange
123456780xb
Y-klystron
-----
perl extraction helper script:
#!/usr/bin/env perl
# @(#) p1 Demonstrate wdiff difference format extraction with labels.
$f1 = shift || die " Missing first label.\n";
$f2 = shift || die " Missing second label.\n";
while (<>) {
@a = m/\[-(.*?)-\]/xmsg;
print "$f1: ", join( "", @a ), "\n" if defined @a;
@b = m/\{\+(.*?)\+\}/xmsg;
print "$f2: ", join( "", @b ), "\n" if defined @b;
}
exit(0);
-----
Results, wdiff format, data1, data2:
orange
12345678[-9-]{+0+}x[-a-]{+b+}
[-X-]{+Y+}-klystron
-----
Results, wdiff format, data1, data2, extracted diff with labels:
data1: 9a
data2: 0b
data1: X
data2: Y
-----
Results, wdiff format, data2, data1, extracted diff with labels:
data2: 0b
data1: 9a
data2: Y
data1: X
The idea is that docdiff can print difference in resolution down to characters. The wdiff-style output is processed by the perl script. The data files were augmented to try to make sure that multiple lines could be processed as well as lines that were identical.
The docdiff utility is written in ruby, is available in Debian-based GNU/Linux repositories, and can also be found at DocDiff: Compare text word by word | Free Development software downloads at SourceForge.net
See man pages for details.
Best wishes ... cheers, drl (125)
---------- Post updated at 08:52 ---------- Previous update was at 08:10 ----------
Hi.
An all-perl solution:
#!/usr/bin/env perl
# @(#) p1 Demonstrate character differences in same-length lines.
use warnings;
use strict;
my (
$f1, $f2, $file1, $file2, $i, @a, @b,
$s1, $s2, $t1, $t2, $changed, $debug
);
$f1 = shift || die " Missing first file.\n";
$f2 = shift || die " Missing second file.\n";
$debug = 1;
$debug = 0;
open( $file1, "<", $f1 ) || die " Cannot open file $f1\n";
open( $file2, "<", $f2 ) || die " Cannot open file $f2\n";
while ( $t1 = <$file1> ) {
chomp($t1);
@a = split "", $t1;
$t2 = <$file2>;
chomp($t2);
@b = split "", $t2;
print "file1,2 = ", join "", @a, " ", join "", @b, "\n" if $debug;
$changed = 0;
$s1 = $s2 = "";
for ( $i = 0; $i <= $#a; $i++ ) {
if ( $a[$i] ne $b[$i] ) {
$s1 = "$f1: " if not $changed;
$s2 = "$f2: " if not $changed;
$s1 .= $a[$i];
$s2 .= $b[$i];
$changed++;
}
}
print "$s1\n" if $changed;
print "$s2\n" if $changed;
}
exit(0);
producing, using the data files noted above:
% ./p2 data1 data2
data1: 9a
data2: 0b
data1: X
data2: Y
Best wishes ... cheers, drl