Compare few columns from two files

My Friends,
Need your help to find the difference between few columns from two comma delimited files. For example, File1 and File2 has 22 columns, and I want to find the difference in first 12 columns.

I have list of file names in MyListOfFiles2Compare.txt. Data is separated with commas. These are .csv and some .txt files. Most files comma delimited. Some .txt files are tab delimited.

From the list of file names in MyListOfFiles2Compare.txt, take the file from
dir1/files and dir2/files and compare the specific number of columns and if any data/column mismatch , need to write the mismatch data in another file called mismatch.csv. When we write the difference to mismatch.csv, write the name of the file which has difference, column number, data from first file and 2nd file. Since I have to compare thousands of files, I need to go back and see which file has mismatch and mismatch data/column.
======= Thank you for giving your input on this =============
~~Manish

To avoid any confusion please provide sample input files and required output.

Sample Input file1 is MyFile1:
"America, LLC","265826","222111","04/01/2009","ddd, Nick","333","eRes-Plus - 333","ddk,Rubino ","R8","15","0.28","","0.00","0.28","","132. Proivdence Road Suite , TX 19063 US"
"America, LLC","265826","93659211","04/01/2009","Rose, Nick","3942489","eRes-Plus - 4102414180","Nick,Rubino ","R8","8","0.15","","0.00","0.15","","1400 N. test Road Suite 5025 x, PA 44333 US"

Sample Input File2 is MyFile1
----------
"America, LLC",123456,44444,04/01/2009,"Russell,ddd",14444,eRes-Plus - 7043589536,"ddd,Russell",R8,43,1.05,,0,1.05,017653,201 S main St 1470 Charlotte court Charlotte 13322
"North, LLC",4444,1111114,04/01/2009,"Russell,ddd",1136671,eRes-Plus - 2159977710,"ddd,Russell",R8,42,1.03,,0,1.03,017653,201 S main St 1470 Charlotte court Charlotte 12345

Expected Output in to mismatch.csv , after compare the first 10 columns from above file 1 and file 2
FileName,column number, data in first file, data in 2nd file
MyFile1,2,265826,123456
MyFile1,3,222111,44444
Thank you,
Manish

Questions:

  1. Are you comparing line x in file 1 with line x in file 2?
  2. Are there quotes around each field (column) data all the time?
  3. How do we know if the file is csv or tab type? file extension?
  4. Should commas be expected in the data?

Hi ,
Here are the answers...
Q)Are you comparing line x in file 1 with line x in file 2?
ANS: Yes, Line x in file 1 with line x in file 2 ( line 1 in file 1, with line 1 in file 2, line 2 in file1 with line 2 in file2 etc....) Before I compare I will be sorting this file

Q) Are there quotes around each field (column) data all the time?
Ans) Some files has quotes and some filed do not.

Q) How do we know if the file is csv or tab type? file extension?
Ans) depends on the file extension we need to decide it is .csv file. we may need to read first 4 columns and see if every column is comma delimited then we can decide it is comma delimited, most of them has csv extension.

Q)Should commas be expected in the data?
Ans) Most of the files has commas, some files has tab or space delimited.

Thank you..

Some hints for you.

compare the first 12 columns of MyFile1 and MyFile2

diff <(cut -d, -f1-12 dir1/$MyFile1) <(cut -d, -f1-12 dir2/$MyFile2)

Use it in a loop