Compare two files and show the mismatch columns

sabzR · January 24, 2016, 2:24pm

I need to compare two files and find the mismatch columns in it for csv and fixed
width file.
Eg:
file1

c1,c2,c3,c4<----columnname
1,a,4,d
2,b,5,e
3,c,6,f

file2

c1,c2,c3,c4<----columnname
3,x,7,f
2,y,8,e
1,z,9,d

output

c2,c3<---- mismatch columname
a,4    x,7
b,5 or y,8 Ok with any values but i need mismatched columnnames.
c,6     z,9

1).In real time column length is high so how to sort column wise
2).how to find mismatch columns.

Any answers??

Don_Cragun · January 24, 2016, 3:46pm

Is this a homework assignment?

What have you tried to solve this problem?

What operating system and shell are you using?

Are the header lines the same in both files? Or, can some columnames appear in a different order or not appear at all in one of the files?

I don't understand your question about sorting??? When you are comparing values line by line (as shown in your example) what would you sort?

Are you saying that every line will have a different value between the two files for every line if any line (other than the headers) has a mismatch in a given column?

No, I don't have any answers. I can't figure out what you're trying to do.

What do you mean by the comment: "Ok with any values but i need mismatched columnnames." If you don't care about the values, why print them?

sabzR · January 24, 2016, 4:33pm

No i faced this scenario while validating 2 big files. Size is around 1.5 gb each.

I tried to sort each column at a time and cutted 1st few rows and tried to find the mismatch column. It taking to much of time.

I am using AIX nd shell s ksh.

Header lines are same in both files but sometime i should face files without header.(objective is to find mismatch column no.)

Values differ in few lines.

The scenario i am facing is should compare two files and find in which column records mismatching and should justify the reason so i need find the column names.

---------- Post updated at 03:03 AM ---------- Previous update was at 02:56 AM ----------

If i found the columname i will sort it out easily by job design in ETl tool.

RudiC · January 25, 2016, 4:01am

Not clear. A few more questions:

How are the rows identified? If by row No., all the rows in your sample should show up in the result.
What does "If i found the columname i will sort it out easily by job design in ETl tool" mean?
What does "In real time column length is high so how to sort column wise" mean?
Does "Header lines are same in both files but sometime i should face files without header.(objective is to find mismatch column no.) " mean: The columns' order is always identical? So we don't need the headers and could just use the col No.s?