I need help on getting difference between 2 .csv files.
I have 2 large . csv files which has equal number of columns. I nned to compare them and get output in new file which will have difference olny.
It's just an example. What I am trying to say is I should get only the values of columns where we have the difference. Not the whole line.
Assuming File will be in sorted order.
There can be m number of columns but for sure in both the files, we will get same columns. If values are different then those values should be given in output.
It can also work if we can get difference in comma separated file like
wherver values matches between 2 files we get blank
,16-12-2011,,
Hope I am able to explain the issue.
In the example given, there is a difference at 2nd column only. But there can be difference in some other columns value as well. This command is giving result for difference at 2nd place only.
Can you give me the command so that I can get result in comma separated format only. By this I will get to know wherever values are not matching in our files.
It's not neccessary to get values from both the file. Let say there is difference at 3rd column and 7th column so my result should be like
,,17-12-2011,,,,10,,,,,,,,,
paste file1 file2 # Here we join files line by line
awk -F "[,\t]" # here we have comma separated file and after paste it has tab which separates two files.
for(i=1;i<=(NF/2);i++){if($i != $(NF/2+i)) # we have to compare column 1 of first file to column 1 of second file and so on. So i have divided number fields to compare it. 1 to (NF/2)+1 and so on..
if(s){s=s";"$i}else{s=$i}}else{if(s){s=s";,"}else{s=","}}}}{ print s;s=""} # Here just we assign values to the variable s as per your requirement and at the end we print that s.
Though the command mentioned above is working fine but is taking more than 2 hrs to completely execute and is time consuming.
Is there any command that can specify only the column values and column number where we have the difference like
If there is difference at column number 3 and column number 7, resultant file will have value as
3,7 (Column Number)
Rahul,23 (Values present in file 1)
Rather than displaying "," for all the columns that are matching between two files. It only display columns where we have the mismatch but with a column number to keep the track.