Compare two string and get "exact" difference only

gvolpini · February 9, 2012, 11:59am

Hi all;

Pretty green to perl programming; been searching high and low for a perl (preferably) or unix script that will compare 2 CSV strings in the same file that are separated buy the "|" character (so basically they're side by side) and give the results of ONLY the exact change; note that 19 is not a line number it's just a numeric field. If it helps; we can assume that both CSV strings have fields so we can call them:
rule, client, location, script, destination, enabled, search, compress

Also; rule can never be altered\changed only fields 2 to 8 can be modified.

So with all that here's an exmple; if I have a file (call it file1) with this content:

19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder> <wonderwoman>is<cute>,YES|19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder>,YES
300,gjv,mary,hadalittlelamb,its,flease,<was>whaite<assnow> <and>everywhere<that> <mary went>thelambwas<sure to go>,YES|300,gjv,mary,hadalittlelamb,its,flease,<was>white<assnow> <and>everywhere<that> <marywent>thelambwas<suretogo> <thefarmer>inthe<dell> <hewas>neverto<comeout>,YES
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <untilone>day<hewenttoofar>,YES|3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <hewas>caught<andsentaway>,YES

Output I am looking for (to be written to file2):

-------------------------------------
AUDIT REPORT
 
CHANGED:
field1:19 
BEFORE:
19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder> <wonderwoman>is<cute>,YES
AFTER:
19,gmp,charlie,brown,is,funny,<super>man<isalive> <super>boy<isolder>,YES
SPECIFICS:
<wonderwoman>is<cute> ----> removed
 
CHANGED:
field1:300
BEFORE:
300,gjv,mary,hadalittlelamb,its,flease,<was>whaite<assnow> <and>everywhere<that> <mary went>thelambwas<sure to go>,YES
AFTER:
300,gjv,mary,hadalittlelamb,its,flease,<was>white<assnow> <and>everywhere<that> <marywent>thelambwas<suretogo> <thefarmer>inthe<dell> <hewas>neverto<comeout>,YES
SPECIFICS:
<thefarmer>inthe<dell> ----> added
<hewas>neverto<comeout> ----> added
 
CHANGED:
field1:3012
BEFORE:
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <untilone>day<hewenttoofar>,YES
AFTER:
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causing> <trouble>around<town> <hewas>caught<andsentaway>,YES
SPECIFICS:
<untilone>day<hewenttoofar> ----> removed
<hewas>caught<andsentaway> ----> added
 
SUMMARY
Rule was changed for 2 clients: gmp,gjv
Total number of rules changes: 3
Rules changed:
gmp: 19
gjv: 300,3012
--------------------------------------------------------

Notes: notice on the SUMMARY that even though there were 2 records changed for client gjv that it only appears once in the line "Rule was changed for 2 clients:" but it is accouhted for in the line "Total number of rules changes:"

Thanks
G

Shell_Life · February 9, 2012, 4:08pm

Here is a skeleton for one possible solution:

#!/bin/ksh
IFS='|'
while read mBefore mAfter; do
  if [[ "${mBefore}" != "${mAfter}" ]]; then
    echo ${mBefore} | sed 's/,/|/g' | read mB1 mB2 mB3 mB4 mB5 mB6 mB7 mB8
    echo ${mAfter} | sed 's/,/|/g' | read mA1 mA2 mA3 mA4 mA5 mA6 mA7 mA8
    if [[ "${mB1}" != "{$mA1}" ]]; then
      echo "Field 1 was changed."
      echo "Before: ${mB1}"
      echo "After: ${mA1}"
    fi
  fi
done < Input_File

Change it according to your requirements.

gvolpini · February 10, 2012, 2:53pm

Thanks for your time...I will try it out.
Sincere regards
Giuliano

---------- Post updated at 02:53 PM ---------- Previous update was at 08:24 AM ----------

Not really working as I intended.
Thanks

---------- Post updated at 02:53 PM ---------- Previous update was at 02:53 PM ----------

Not really working as I intended.
Thanks

drl · February 11, 2012, 9:52pm

Hi.

I tend to use whatever is "standard" or available on systems before I start coding a custom solution. So here is an example using utility dwdiff. The output is not in the format you desired, but it was quick to put together. The key part is the dwdiff and the slight reformatting with sed:

#!/usr/bin/env bash

# @(#) s1	Demonstrate difference by "word", dwdiff.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dwdiff

FILE=${1-data1}

pl " Sample of data file $FILE:"
cut -c1-50 $FILE

pl " Results:"
dwdiff -s -3 -d"," <( cut -d"|" -f1 $FILE ) <( cut -d"|" -f2 $FILE ) |
sed -e 's/^ *//' -e 's/\] *{/\]\n{/'

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
dwdiff 1.8.2

-----
 Sample of data file data1:
19,gmp,charlie,brown,is,funny,<super>man<isalive> 
300,gjv,mary,hadalittlelamb,its,flease,<was>whaite
3012,gjv,timmy,wasabad,boy,andhe,<was>always<causi

-----
 Results:
old: 54 words  47 87% common  1 1% deleted  6 11% changed
new: 52 words  47 90% common  0 0% inserted  5 9% changed
======================================================================
[-<wonderwoman>is<cute>-]
======================================================================
[-<was>whaite<assnow>-]
{+<was>white<assnow>+}
======================================================================
[-<mary went>thelambwas<sure to go>-]
{+<marywent>thelambwas<suretogo> <thefarmer>inthe<dell> <hewas>neverto<comeout>+}
======================================================================
[-<untilone>day<hewenttoofar>-]
{+<hewas>caught<andsentaway>+}
======================================================================

The square brackets mark deletions, curly braces insertions. Both can be changed.

A web page for dwdiff is at dwdiff - ghalkes:~#

Best wishes ... cheers, drl