Compare 2 csv files in ksh and o/p the difference in a new csv file

Naresh101 · March 14, 2013, 12:51am

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below:
file1.csv
ID,version,cost
1000,1,30
2000,2,40
3000,3,50
4000,4,60

file2.csv
ID,version,cost
1000,1,30
2000,2,45
3000,4,55
6000,5,70
The expected o/p is a new csv file say - file3.csv should contain the details of IDs which are present in both the files but with some of the data related to it being different (here version and cost) - as shown below.

Expected o/p file - file3.csv

ID,field,old,new
2000,cost,40,45
3000,version,3,4
3000,cost,50,55
I need a unix ksh script for doing this. Please help me out.

Yoda · March 14, 2013, 12:55am

Can you kindly explain what difference this question has with the one in posted this thread ?

anbu23 · March 14, 2013, 12:55am

Check the solution posted in your previous thread
http://www.unix.com/shell-programming-scripting/218209-comparing-2-csv-files-sending-difference-new-csv-file.html

Yoda · March 14, 2013, 1:01am

Also why are you so particular about a solution using bash or ksh!

Is this a homework question?

Naresh101 · March 14, 2013, 1:31am

Its a small part in my intern project.... I m quiet new to unix shell scripting... so need the help...

Yoda · March 14, 2013, 10:22am

Here is a bash script:

#!/bin/bash

while IFS="," read f1_ID f1_VER f1_COST
do
        [[ "$f1_ID" =~ ID ]] && continue;

        while IFS="," read f2_ID f2_VER f2_COST
        do
                [[ "$f2_ID" =~ ID ]] && continue;

                if [ $f1_ID -eq $f2_ID ]
                then
                        if [ $f1_ID -eq $f2_ID ] && [ $f1_VER -eq $f2_VER ] && [ $f1_COST -eq $f2_COST ]
                        then
                                continue;
                        fi
                        [[ $f1_VER -ne $f2_VER ]] && printf "${f1_ID},version,${f1_VER},${f2_VER}\n"
                        [[ $f1_COST -ne $f2_COST ]] && printf "${f1_ID},cost,${f1_COST},${f2_COST}\n"
                fi
        done < file2.csv
done < file1.csv

Naresh101 · March 15, 2013, 7:09am

Thank you @Yoda....
But I used the segment with awk in your other post in my code as below ... That worked fine too.. Thanks a lot

join -t"," -1 1 -2 1 -a1 file1.csv file2.csv | awk -F, ' BEGIN {
         print "ID,field,old,new"
 } NF > 3 { 
        if ( $3 != $5 )   
              print $1, "cost", $3, $5        
       if ( $2 != $4 )              
            print $1, "version", $2, $4 } ' OFS=, 

echo "the IDs which are not present in first file are ">>file3000.csv 
awk -F, 'NR==FNR{_1[$1]++;next}!_1[$1]' file81.csv file82.csv >> file3000.csv 
cut -d',' -f1 file3000.csv>>file3001.csv 
cat file3001.csv  

echo "the IDs which are not present in second file are ">>file4000.csv 
awk -F, 'NR==FNR{_1[$1]++;next}!_1[$1]' file82.csv 
file81.csv >> file4000.csv cut -d',' -f1 file4000.csv>>file4001.csv 
cat file4001.csv

In addition to list down the differences in fields(cost / version) between 2 files I also needed to create a new file
which contains the missing IDs in a file compared to the other... The o/p which I got is as below...

ID,field,old,new
2000,cost,40,45
3000,version,3,4
3000,cost,50,55
the IDs which are not present in first file are
6000
7000
the IDs which are not present in second file are
4000
8000

---------- Post updated at 04:39 PM ---------- Previous update was at 04:24 PM ----------

And can I attain the same result as mentioned above using perl scripting?..
In that case what changes should I make?... I went through Perl script basics but it seemed to be different altogether from bash/ksh...

Naresh101 · March 18, 2013, 3:04am

@Yoda Thanks for posting the code.... Its works good..
But what if the no. of fields (columns) and their names are unknown!?
How to achieve a generalized script which compares 2 files according to their columns which are not known to the user?