Compare two files and show the mismatch columns

I need to compare two files and find the mismatch columns in it for csv and fixed
width file.
Eg:
file1

c1,c2,c3,c4<----columnname
1,a,4,d
2,b,5,e
3,c,6,f

file2

c1,c2,c3,c4<----columnname
3,x,7,f
2,y,8,e
1,z,9,d

output

c2,c3<---- mismatch columname
a,4    x,7
b,5 or y,8 Ok with any values but i need mismatched columnnames.
c,6     z,9

1).In real time column length is high so how to sort column wise
2).how to find mismatch columns.

Any answers??

Is this a homework assignment?

What have you tried to solve this problem?

What operating system and shell are you using?

Are the header lines the same in both files? Or, can some columnames appear in a different order or not appear at all in one of the files?

I don't understand your question about sorting??? When you are comparing values line by line (as shown in your example) what would you sort?

Are you saying that every line will have a different value between the two files for every line if any line (other than the headers) has a mismatch in a given column?

No, I don't have any answers. I can't figure out what you're trying to do.

What do you mean by the comment: "Ok with any values but i need mismatched columnnames." If you don't care about the values, why print them?

No i faced this scenario while validating 2 big files. Size is around 1.5 gb each.

I tried to sort each column at a time and cutted 1st few rows and tried to find the mismatch column. It taking to much of time.

I am using AIX nd shell s ksh.

Header lines are same in both files but sometime i should face files without header.(objective is to find mismatch column no.)

Values differ in few lines.

The scenario i am facing is should compare two files and find in which column records mismatching and should justify the reason so i need find the column names.

---------- Post updated at 03:03 AM ---------- Previous update was at 02:56 AM ----------

If i found the columname i will sort it out easily by job design in ETl tool.

Not clear. A few more questions:

  • How are the rows identified? If by row No., all the rows in your sample should show up in the result.
  • What does "If i found the columname i will sort it out easily by job design in ETl tool" mean?
  • What does "In real time column length is high so how to sort column wise" mean?
  • Does "Header lines are same in both files but sometime i should face files without header.(objective is to find mismatch column no.) " mean: The columns' order is always identical? So we don't need the headers and could just use the col No.s?