Compare to files and export only different values

_dagio · November 5, 2010, 4:18pm

Hello,

I need to compare two files which have the following structure

File1:

No            : 1
Name       : George/Brown
Value2      : type2
Value3     :  type3
Date        :  Wed Oct 20 11:12:58 2010
Value       : yes
 
No             : 2
Name        : John/Cash
Value2       : type2
Value3       : type3
Date        :  Wed Oct 20 11:12:58 2010
Value       :  17

No            : 3
Name       : Maria/Blond
Value2      : type2
Value3     :  type3
Date        :  Wed Oct 20 11:12:58 2010
Value       :  yes

File2:

No            : 1
Name       : George/Brown
Value2      : type2
Value3     :  type3
Date        :  Wed Jan 20 12:12:34 2010
Value       : yes

No             : 2
Name        : John/Cash
Value2       : type2
Value3       : type3
Date        :  Wed Oct 20 13:15:45 2010
Value       :  14

No            : 3
Name       : Maria/Blond
Value2      : type2
Value3     :  type3
Date        :  Wed Oct 20 12:12:54 2010
Value       :  no

I need the output to be like

Name               : John/Cash
Value(file1)        :  17
Value(file2)        : 14


Name               : Maria/Blond
Value(file1)       :  yes
Value(file2)       :  no

Any idea how can i do this?

Thanks in advance

DGPickett · November 5, 2010, 4:36pm

Well, take it in pieces. Most tools are line oriented. As the lines come in, they can be rewritten in a form more friendly to comparison, like:

Then, you can sort them and run the two files through comm -3 to get lines like this (\t is tab):

\t2|John/Cash|Value|14
2|John/Cash|Value|17

Then, a sed script can marry the lines back together and create your format (if you are fussy). One challenge is the the file 2 line may sort low, and come out first, as in my example. Also, output will not be in original order, but sorted ascending binary order.

Writing a merge routine is a bit much for scripts, but it can be done. Many programmers muff the logic even in more powerful languages, so using off the shelf tools is a big win.

Maybe the awk guys have a way to deal with it. How invariant is the format? Can lines come and go or move around. Your example has 2 different headers on rec no.

_dagio · November 5, 2010, 4:53pm

Thanks for your responce
You are right regarding the headers.I changed them.
Lets say that the lines does not come and go around.

DGPickett · November 5, 2010, 4:59pm

I suppose you could parse one file and for data lines, get the same line # from the other file to compare. You could mark the name lines so they never compare, and then use diff in one of its modes to show you where the changes are.

The diff has a mode I like a lot, '-C 999999', where all lines are present, marked +-=, so you could parse the diff output capturing =name lines and reporting -+ lines in one stream in a 'while read l do done' loop. Try that. Many ways to skin cat in UNIX!

ctsgnb · November 5, 2010, 7:19pm

Ok it does not exactly fit the wanted output, but the information are here (it seems like you are not interested by the difference of the date field so i just skipped it).
in1 is File1
in2 is File2

[ctsgnb@shell ~]$ grep -v Date in1 | sed 's/ *: */|/;s/ /_/g' | awk -F"|" '{print$2}' | xargs -n5 echo >in1.nd
[ctsgnb@shell ~]$ grep -v Date in2 | sed 's/ *: */|/;s/ /_/g' | awk -F"|" '{print$2}' | xargs -n5 echo >in2.nd
[ctsgnb@shell ~]$ comm -3 in1.nd in2.nd
        2 John/Cash type2 type3 14
2 John/Cash type2 type3 17
        3 Maria/Blond type2 type3 no
3 Maria/Blond type2 type3 yes
[ctsgnb@shell ~]$

---------- Post updated at 12:19 AM ---------- Previous update was at 12:03 AM ----------

another idea would be to present the record this way (@ is just an example, any other separator could be chosen)

George/Brown@No            : 1
George/Brown@Name       : George/Brown
George/Brown@Value2      : type2
George/Brown@Value3     :  type3
George/Brown@Date        :  Wed Jan 20 12:12:34 2010
George/Brown@Value       : yes
George/Brown@
John/Cash@No             : 2
John/Cash@Name        : John/Cash
John/Cash@Value2       : type2
John/Cash@Value3       : type3
John/Cash@Date        :  Wed Oct 20 13:15:45 2010
John/Cash@Value       :  14
John/Cash@
....

This way the files could be just grep on the name and/or comm ,and/or ...| sort | uniq or just chose a key@
where <key> could be the value of No instead of the value of Name