Compare and matching column entries in 2 files and

kieranfoley · February 14, 2014, 11:25am

I have 2 files. File 1 has more columns (6 columns but the last column has spaces) than file 2 (file 2 has 4 columns). The entries in file 1 do not change but column 4 in file 2 can be different from the the entry in file 1. I want to create a script that reads in file 1 and then uses column 1 2 and 3 from file 1 to find the same entries in column 1 2 and 3 from file 2. When this happens I want to print out column 1 2 and 3 from file 1 and column 4 from file 2 and column 5 and 6 from file 1 all on the same line.

So file one could like like this

abc   xxx    hhh    555 678   line label 99 in prod support
cde   rrr     kkk     321 543   line label 55 in dev

And file 2 could look like this

abc   xxx    hhh    555 444
cde   rrr     kkk     321 789

I want to print out

abc   xxx    hhh    555 444   line label 99 in prod support
cde   rrr     kkk     321 789   line label 55 in dev

Note the last column in file 1 can have many characters "line label 99 in prod support"

So I have been trying to use "while read" to read in file 1 and compare it with file 2 but I can't seem to get the right syntax. I have also tried using various awk commands. Here is the latest while read I tried....

   while read VNX2 DM2 FS2 SZ2 GL_NUM2 GL_NAME2
    do

        if egrep "$VNX2 $DM2 $FS2" f2.out
           then
              SZ=`cat f2.out | egrep  "$VNX2 $DM2 $FS2" | awk '{print $4}'`
              echo "$VNX2 $DM2 $FS2 $SZ $GL_NUM2 $GL_NAME2"
           else
             echo "Error...."
        fi


    done < f1.out

I'd appreciate any tips or help on how to do this. Thanks!

---------- Post updated at 04:25 PM ---------- Previous update was at 04:19 PM ----------

Sorry I got my columns mixed up. Column 4 needs to be swapped with column 5 in the 2 line sample I submitted. Hope this make sense. Thanks!

gina.lizar · February 14, 2014, 11:36am

please fix your request so that the requirements are clear.

Yoda · February 14, 2014, 11:40am

Here is an awk approach:

awk '
        NR == FNR {
                k = $1 FS $2 FS $3 FS $4
                $1 = $2 = $3 = $4 = $5 = ""
                sub ( /^ */, x )
                A[k] = $0
                next
        }
        {
                k = $1 FS $2 FS $3 FS $4
                if ( k in A )
                        print k, $5, A[k]
        }
' file1 file2

Akshay_Hegde · February 14, 2014, 11:42am

awk 'function key(){
                      return $1 FS $2 FS $3 FS $4
                   }
            FNR==NR{
                     A[key()]=$0
                     next
                   }
   ((v=key()) in A){
                     while(++i<=5)$i=x;
                     $0=$0; $1=$1
                     print A[v],$0;i=0
                   }
    ' file2 file1

abc   xxx    hhh    555 444 line label 99 in prod support
cde   rrr     kkk     321 789 line label 55 in dev

kieranfoley · February 14, 2014, 12:55pm

Thank you. This works perfect!

Scrutinizer · February 14, 2014, 3:30pm

@kieranfoley. There seem to be some contradictions between your input and output files and your description. Although the solutions given apparently "work perfectly", could you please clarify what you mean, so this thread becomes more comprehensible?

Is the input file TAB-separated or space-separated?
The second input file seems to contain 5 columns, not 4
The order of the fields seems to be fields 1,2,3,4 from file1, field 5 from file 2 and the rest from file 1 again?

--edit--
If point 2 and 3 are so, then this would also produce that output:

awk '{i=$1 FS $2 FS $3} NR==FNR{A=$5; next} i in A{$5=A}1' file1 file2