File comparison in UNIX columnwise

prabhat.diwaker · June 24, 2014, 8:59am

Hi all,

I want to compare two files with same number of rows and columns with records in same order.
Just want to highlight the differences in the column values if any.

file A

1,kolkata,19,ab
2,delhi,89,cd
3,bangalore,56,ef

file2:

1,kolkata,21,ab
2,mumbai,89,gh
3,bangalore,11,kl

considering column 1 as primary key, we have differences in other columns.
i want to highlight those differences

Output format may be (not sure ):

record_number , columns_with_diff
1  3
2  2,4
3  3,4

does diff or comm can solve my problem ? if yes then what would be the exact command.. m trying but not getting

pamu · June 24, 2014, 9:07am

Try

 $ awk -F, 'BEGIN{print "record_number , columns_with_diff"}
         NR==FNR{for(i=2;i<=NF;i++){A[$1,i]=$i}next}{for(i=2;i<=NF;i++){if($i!=A[$1,i]){p=p?p","i:i}}print $1,p;p=""}' file1 file2

record_number , columns_with_diff
1 3
2 2,4
3 3,4

prabhat.diwaker · June 24, 2014, 9:11am

That is awesome..!!

Thanx pamu.!

Scrutinizer · June 24, 2014, 9:15am

Variation without using arrays:

awk -F, '{s=p=x; getline p<f; split(p,F); for(i=1; i<=NF; i++) if($i!=F) s=s (s?FS:x) i; print NR, s}' f=file2 file1

prabhat.diwaker · June 24, 2014, 9:32am

Thanks moderator..!!

Scrutinizer · June 24, 2014, 9:37am

You are welcome... I just notice you want to use $1 as primary key instead of the line number. Then this adaptation would be required:

awk -F, '{s=p=x; getline p<f; n=split(p,F)} n<NF{n=NF} {for (i=2; i<=n; i++) if($i!=F) s=s (s?FS:x) i; print $1, s}' f=file2 file1

prabhat.diwaker · June 24, 2014, 10:00am

One little modification

what if records are not in order and we need to compare them on keys say:

file A

1,kolkata,19,ab
2,delhi,89,cd
3,bangalore,56,ef

File B

2,mumbai,89,gh
1,kolkata,21,ab
3,bangalore,11,kl

I would need primary_key, columns_mismatching(Same requirement)

Thanks for ur help..!!

Scrutinizer · June 24, 2014, 10:24am

You could try something like:

awk -F, 'NR==FNR{A[$1]=$0; next} {s=x; split(A[$1],F); for (i=2; i<=NF; i++) if($i!=F) s=s (s?FS:x) i; print $1, s}' file2 file1

pamu · June 24, 2014, 10:26am

Then try my solution

prabhat.diwaker · June 24, 2014, 10:34am

thanks both :b :b :b