Compare files to pull changed records only

Saanvi1 · August 1, 2016, 12:45pm

Hi,
I am using Sun Solaris - SunOS. I have two fixed width files shown below. I am trying to find the changes in the records in the Newfile.txt for the records where the key column matches. The first column is a key column (example: A123).
If there are any new or deletion of records in the Newfile, I do NOT want that in the output.
All I am try to achieve is the output of the changed records where the key column matches.

File1:OldFile.txt

A123   Sim   Firstname1          Lastname1  123 JESSE DR.     Atlanta   GA32839   Sampleemail@YAHOO.COM
B234   TWD   Firstname2          Lastname2  123 FORTHILL1 DR. Atlanta   GA32839   Sampleemail2@YAHOO.COM
C567   TWD   Firstname3          Lastname3  123 FORTHILL2 DR. Atlanta   GA32839   Sampleemail3@YAHOO.COM
D89012 TWD   Firstname3          Lastname3  123 FORTHILL2 DR. Atlanta   GA32839   Sampleemail3@YAHOO.COM

File2:NewFile.txt

A123   Sim  UpdatedNewFirstName1 UpdatedNewLastname1  123 JESSE DR.      Atlanta   GA32839  sampleemail@YAHOO.COM
B234   TWD  Firstname2           Lastname2            123 FORTHILL1 DR.  Atlanta   GA32839  Sampleemail2@YAHOO.COM
C5676  TWD  Firstname3           Lastname3            123 FORTHILL2 DR.  Atlanta   GA32839  Sampleemail3@YAHOO.COM
Z12345 TWD  Firstname3           Lastname4            123 FORTHILL2 DR.  Atlanta   GA32839  Sampleemail3@YAHOO.COM

So in above example: Output would be as shown below. The UpdatedNewFirstName1 and UpdatedNewLastname1 for key A123 is changed.
A123 Sim UpdatedNewFirstName1 UpdatedNewLastname1 123 JESSE DR. Atlanta GA32839 sampleemail@YAHOO.COM

and ignore the two records below:
D89012 TWD Firstname3 Lastname3 123 FORTHILL2 DR. Atlanta GA32839 Sampleemail3@YAHOO.COM -- This record dropped in new file. I do not want this in my output.

Z12345 TWD Firstname3 Lastname4 123 FORTHILL2 DR. Atlanta GA32839 Sampleemail3@YAHOO.COM -- This record added in new file. I do not want this either.

PS: Please ignore the format of the files as I created a sample file above, which might be slightly off.

All I need a CHANGED records where the first field keys matches.

Thanks

rdrtx1 · August 1, 2016, 1:07pm

awk '
NR==FNR {$1=$1; a[$1]=$0;  next}
a[$1] {l=$0; $1=$1; if (a[$1] != $0) print l}
' File1 File2

Don_Cragun · August 1, 2016, 2:01pm

Since you're using a Solaris/SunOS system, you'll need to use /usr/xpg4/bin/awk or nawk instead of awk . I don't see the need for the three arrays that rdrtx1 used... I think the following should run a tiny bit faster and use less memory while it is running:

/usr/xpg4/bin/awk '
NR == FNR {
	for(i = 2; i <= NF; i++)
		f1[$1] = f1[$1] SUBSEP $i
	next
}
$1 in f1 {
	f2 = ""
	for(i = 2; i <= NF; i++)
		f2 = f2 SUBSEP $i
	if(f1[$1] != f2)
		print
}' OldFile.txt NewFile.txt

RudiC · August 1, 2016, 2:50pm

Slightly different approach:

awk '
                {gsub (/  */, " ")
                 IX = $1
                 sub ("^" $1, _)
                }

NR == FNR       {TMP[IX] = $0
                 next
                }

match ($0, TMP[IX])     {next
                        }

                {n = split (TMP[IX], CMP)
                 for (i=1; i<=n; i++) if ($i != CMP) printf "%s ", $i
                 print "for key", IX, "is changed."
                 print IX, $0
                }


' file1 file2
UpdatedNewFirstName1 UpdatedNewLastname1 sampleemail@YAHOO.COM for key A123 is changed.
A123  Sim UpdatedNewFirstName1 UpdatedNewLastname1 123 JESSE DR. Atlanta GA32839 sampleemail@YAHOO.COM

Saanvi1 · August 9, 2016, 12:31pm

Thanks a bunch. This was very helpful.

....Saanvi