Compare columns in two different files using awk

shell_newbie · September 15, 2011, 9:18am

Hi,

I want to compare the columns of two files excluding column 2 from both the files. I tried this awk command.

 awk -F":" 'NR==FNR{++a[$1,$3,$4];next} !(a[$1,$3,$4])' file1.txt file2.txt

[Apart from column2, if the any of the columns did not match, then that whole row of file2.txt is printed; but I have compared considering that there are only 4 columns on the whole].

Example: File1.txt

123:09-15-2011:abc:123456
123:09-15-2011:abc:234567
123:09-15-2011:abc:345678

File2.txt

123:09152011:abc:123456
123:09152011:abc:234567
123:09152011:abc:124567

In actual case, I am not sure how many columns each file may contain, so this awk command I constructed is limitted.
How can I compare columnwise, excluding column 2, without knowing the total number of columns in each file?

birei · September 15, 2011, 9:44am

Hi,

One way:

$ awk 'BEGIN { FS = OFS = ":" } NR==FNR{ $2 = ""; ++a[$0];next} { second_field = $2; $2 = ""; if ( !(a[$0]) ) { $2 = second_field ; print } }' file1 file2
123:09152011:abc:124567

Regards,
Birei

shell_newbie · September 15, 2011, 9:51am

Thanks Birei,

Can you explain the code? I am still a rookie with awk

birei · September 15, 2011, 10:48am

Here you have:

$ cat script.awk
## Execute this part once. Set input and output field separators to ':'.
BEGIN {
        FS = OFS = ":"
}

## FNR counts lines of each file and NR counts lines of all input files so
## they only will be equal processing first input file.
NR == FNR {
        ## Remove second field and save all the line in an array, so lines will be
        ## saved like:
        ## 123::abc:123456
        ## 123::abc:234567
        ## 123::abc:345678
        ##
        ## We have rid of second field!!
        $2 = ""
        ++a[$0]

        ## Process next line from the beginning.
        next
}

## Run this part of code in every line (of second file).
{ 
        ## Save second field and reset it, like in first file.
        second_field = $2
        $2 = ""

        ## Search for the line in the array, if not exists it means that any field 
        ## (except second one, because was empty) is different, so recover second field and
        ## print the line.
        if ( !a[$0] ) {
                $2 = second_field
                print
        }
}
$ awk -f script.awk file1 file2
123:09152011:abc:124567

Regards,
Birei

shell_newbie · September 15, 2011, 10:55am

Thanks a lot for such a detailed explanation Birei!

sandip.vpcoe · February 15, 2012, 10:19pm

I want to know all the possible solutions to this like using cut, diff, awk, sed, script etc.
problem-
compare two columns in 2 diff files...
output-both columns are exactly matching
or both columns are not matching and show the difference.