Format diff output

I need to compare two directories with tab separated files. I'm using diff to do this. diff output doesn't identify which column values are different, it just tells which lines are different. Is there any way to format diff output. Thanks

f1.txt
210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11
310	998877	usg	850	ex	11
410	998877	web	1003		
210	998878	phone	9981128210	add	112 nw st.
310	998878	usg	750	ex	11
410	998878	web	930		
	

f2.txt
210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11.00
310	998877	usg	750	ex	11
410	998877	web	1203		
210	998878	phone	9981128210	add	112 nw st.
310	998878	usg	750	ex	11
410	998878	web	850		

diff output -
 
 diff -b f1.txt f2.txt

2,4c2,4
< 310   998877  usg     650     ex      11
< 310   998877  usg     850     ex      11
< 410   998877  web     1003
---
> 310   998877  usg     650     ex      11.00
> 310   998877  usg     750     ex      11
> 410   998877  web     1203
7c7
< 410   998878  web     930
---
> 410   998878  web     850

I want to reformat -

310 998877 column6 11 11.00
310 998877 column4 850 750
410 998877 column4 1003 1203
410 998878 column4 930 850


I just did this quick at the command line, and cut and paste when the results returned correctly. You're better off creating an awk command file and using the -f option to run these awk commands. i.e.

diff <diff options and files> | awk -f <awkfile> 

I assumed that columns 1-3 always matched. If incorrect you can modify to your needs.

diff -b f1.txt f2.txt | awk '/^>/ { a[i++]=$0;maxi=i } /^</ { b[j++]=$0;maxj=j }

END{for (i=0;i<maxi;i++) { split(a,c); split(b,d);

if ( c[5] != d[5] ) 
  printf "%s %s column4 %s %s\n", c[2], c[3], d[5], c[5]
else
  if (c[6] != d[6] )
    printf "%s %s column5 %s %s\n", c[2], c[3], d[6], c[6]
  else
    if ("c[7]" != "d[7]" )
      printf "%s %s column6 %s %s\n", c[2], c[3], d[7], c[7]
} } '

Hi.

There are several schemes to draw attention to the areas in lines which differ. One such is colordiff. Colorized text is not easy to paste in here, but the program can also do an interesting job by marking up the lines with subtractions and additions. For example:

#!/usr/bin/env bash

# @(#) s1	Demonstrate coloration of diff output.
# colordiff - a tool to colorize diff output

# Infrastructure details, environment, commands for forum posts. 
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo ; echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p wdiff colordiff
set -o nounset
echo

echo " Samples of data files:"
specimen data1 data2 \
|| { head -5 $FILE ; echo " --" ; tail -5 $FILE; }

echo
echo " Results:"
wdiff -n data1 data2 |
colordiff

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
GNU wdiff 0.5
colordiff diff (GNU diffutils) 2.8.1

 Samples of data files:
Whole: 5:0:5 of 7 lines in file "data1"
210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11
310	998877	usg	850	ex	11
410	998877	web	1003		
210	998878	phone	9981128210	add	112 nw st.
310	998878	usg	750	ex	11
410	998878	web	930		

Whole: 5:0:5 of 7 lines in file "data2"
210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11.00
310	998877	usg	750	ex	11
410	998877	web	1203		
210	998878	phone	9981128210	add	112 nw st.
310	998878	usg	750	ex	11
410	998878	web	850	

 Results:
210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	[-11-]	{+11.00+}
310	998877	usg	[-850-]	{+750+}	ex	11
410	998877	web	[-1003-]	{+1203+}		
210	998878	phone	9981128210	add	112 nw st.
310	998878	usg	750	ex	11
410	998878	web	[-930-]	{+850+}	

Although not shown here, the surrounded strings are also colored on the display. The command can also run diff internally, but I did not find that display as useful as the one involving word-diff -- wdiff.

The colordiff was in the Debian repository I use, but you can also find it at ColorDiff for several flavors of *nix.

I think that Jeffrey Friedl wrote a perl code that high-lighted differences by inverting the display scheme for the parts of the strings that differed. However, I could not find that code with a quick Google. It may be in one of his books on regular expressions, Mastering Regular Expressions, Third Edition - O'Reilly Media

I'm sure that there are other solutions, likely found by searching with keywords such as high-lighting, differences, etc.

Best wishes ... cheers, drl