Comparing two files and generating the report

Hi All,

What am trying to do is generate the report by compating two files.

File A
-----------
111 22222 3333
222 55555 7777

File B
-----------
11A 22222 3333
333 55555 7778

Now the report should be as follows

Added:
333 55555 7778

Removed:
222 55555 7777

Modified
111 22222 3333

I have tried using the diff and comm command, the issue am facing is in getting the modified record, by using comm -23 and comm -13 , I am able to get the individual records which are added or removed (Same result using the grep -Fvf file1 file2 and grep -Fvf file2 file1).

So kindly advise how can I get the modified record

How do you know a record is modified and not simply added?

Sorry did not get your question.

For records which are added, I am using
comm -13 file1 file2 ...and this will give me only lines in File2

As I look at it,
all items in FileB are added while
all items in FileA are removed.

I do not understand how/why you say a record was changed.
Why is the 111_ modified to be 11A_ but the 222_ is not modified to 333_?

What is the rule to determine a modification?

Ok...
Actually the files are such that only a specific column detail will be change (Say column 5) This has to be reported as modification.

If there are any addition/removal, the entire record will be added/removed

I think the below will explain a bit more
File A
--------------------------------
AAA BBB CCC DDD EEEE FFF
111 222 333 444 555 666
XXX YY CDE GTY YSE TYU
File B
------------------------------
AAA BBB CCC ZZZ EEEE FFF
QQQ ZZZ GHJ SDF JJJJ KLK

If you see the very first records in File A and File B is a modification (Column 4 changed) . This have to be reported as Modified

The second row in File A is not present in File B, hence to be reported as removed

similarly Second row in File B is to be mentioned as Added.

In your first example, a single character at position 3 is different; and thus you call this "Modification".
In your 2nd example, three characters at positions 13-14-15 are different; and this is also a "Modification".

What if 20 characters are different? Is that simply another Modification? When is a Modification not a Change, but a delete and add?

Please take only the second example, also its not the character position as its a whole word. In fact these data are from two .xls file so you can say a TAB seperated or comma seperated (if the file A and file B are .csv)

So its a change of one column which specifies that its a modification.

---------- Post updated 07-16-13 at 02:47 AM ---------- Previous update was 07-15-13 at 08:14 AM ----------

All,

As a step forward what am trying to do is
a) Take the comm -12 fileA and FileB, append it with difference of comm -23 fileA and fileB.
b) Use a column (3rd in my case) to see if its present in FileA and fileB.
c) If present in both files its a modification
b) If any one of the file than either Added or Removed.

I suppose for step b, c and d awk can be used with if else loop, if some one can provide how to acheive this would be of great help.

---------- Post updated at 03:41 AM ---------- Previous update was at 02:47 AM ----------

Hi All......
I am trying to traverse the difference file and segregating the record as either Addded, Modified or removed. Using the below code

awk 'BEGIN {print "Start Generating the report";}
{if (grep -q $2 FileA && grep -q $2 FileB) 
 echo modified
else if( grep -q $2 FileA && !grep -q $2 FileB)
 echo removed
else if ( !grep -q $2 FileA && grep -q $2 FileB)
 echo added
END {print "Report generated";}' difference.txt

Howver getting the syntax error. Could you please advise

$2 is the Unique parameter which can be used in files