Comparison of two files in awk

jerome_Sukumar · July 25, 2006, 3:01am

Hi,
I have two files file1 and file2 delimited by semicolon,
And I want to compare column 2 and column3 of file1 to column3 and column 4 in file2.

file1
--------
abc;cef;155.67;143_34;
def;fgh;146.55;123.3;
frg;hff;134.67;;
yyy;fgh;134.78;35_45;

file 2
---------
abc;cef;155.09;;
abc;cef;155.67;143_34;
asd;;;123;
def;fgh;145.6;123.3;
def;fgh;146.55;123.3;
frg;hff;134.67;;

Successfile1
------------
abc;cef;155.67;143_34;
def;fgh;146.55;123.3;

Failfile1
-----------
frg;hff;134.67;;
yyy;fgh;134.78;35_45;

Can anyone help me with a script.

girish.karulkar · July 25, 2006, 4:48am

Hi Jerome

First of all wht is see is col2 of file 1 is text & col3 of file2 is number,
so how u r going to compare?

but still you can use somewhat this way

#!/usr/bin/ksh

cut -d";" -f2 file1 >> tmpf2.txt
echo
cut -d";" -f3 file2 >> tmpf3.txt

diff tmpf2.txt tmpf3.txt

cut -d";" -f3 file1 >> tmpf3.txt
echo
cut -d";" -f4 file2 >> tmpf4.txt

diff tmpf3.txt tmpf4.txt

rm tmpf[0-9].txt

jerome_Sukumar · July 25, 2006, 5:13am

Sorry girish,

I have given the column info wrongly,
Its col3 and col4 of file1 to col3 and col4 of file2 comparison.

grial · July 25, 2006, 6:11am

Perhaps this is what you want, but I'm not sure if I've understood you

#!/bin/bash

comp1=($(cat text1.txt | cut -d\; -f 3,4))
comp2=($(cat text2.txt | cut -d\; -f 3,4))

for str in ${comp1[*]}; do
   i=0
   while (( $i < ${#comp2[*]} )); do
      if [[ $str = ${comp2} ]]; then
         cat text1.txt | grep $str
      fi
      (( i += 1 ))
   done
done

Regards.

jerome_Sukumar · July 25, 2006, 6:41am

Hi Grial,
Thanks for your prompt and quick response.

The script works for comparing two cols i.e., col 3 and col4 of two files.

If i try to try to compare only col3 of two files,
I am getting redundant records.

Eg:
My File1 consists of 100 records and
file2 consists of 238 records.If i try to compare,file1 and file2 I got 116 records as my o/p
in the console.Can u suggest me,how to rectify this.

grial · July 25, 2006, 7:37am

Again, I don't know if I've understood. Do you mean you could have duplicate records on file2? Or, Do you want only the first ocurrence? If this is teh case, try:

#!/bin/bash

comp1=($(cat text1.txt | cut -d\; -f 3,4))
comp2=($(cat text2.txt | cut -d\; -f 3,4))

for str in ${comp1[*]}; do
   i=0
   while (( $i < ${#comp2[*]} )); do
      if [[ $str = ${comp2} ]]; then
         cat text1.txt | grep $str
         break
      fi
      (( i += 1 ))
   done
done

jerome_Sukumar · July 26, 2006, 5:47am

Hi Grial,
Again thanx for ur kind repsonse,Let me explain clearly.
I have compared col3 of file1 and col3 of file2.
I got duplicates of file1 with the latest script send by you.
And one more thing is that,i will not be getting any duplicate records for both the files.
Just i want to check columns/column of file1 with file2.

grial · July 26, 2006, 6:26am

mmmmmmm... Still not clear for me.... Let me see If now I understand.

You want to check one-to-one, or
two-to-two?
Please, give me another example to make it clear

jerome_Sukumar · July 26, 2006, 7:04am

Yah....
Its a one to one mapping between the files..

file1
-------
a;a;c;
d;f;g;
3;7;8;

file2
------
4;7;8;
3;4;7
a;a;c;
d;f;g;

success file1
-----------
a;a;c;
d;f;g;

fail file1
--------
3;7;8;

I want to get success and fail records of file1 in different file..

I dont need any information in file2.(You can take it as an mapping file like..)

grial · July 26, 2006, 7:08am

OK.
col3-to-col3 always or col3-to-colX or colX-to-colY?
In your example you are comparing whole lines with whole lines...
mmmmm:

#!/bin/bash

> success.txt
> fail.txt
comp1=($(cat text1.txt))
comp2=($(cat text2.txt))

for str in ${comp1[*]}; do
   i=0
   FOUND=no
   while (( $i < ${#comp2[*]} )); do
      if [[ $str = ${comp2} ]]; then
         cat text1.txt | grep $str >> success.txt
         FOUND=yes
         break
      fi
      (( i += 1 ))
   done
   if [[ $FOUND = no ]]; then
      cat text1.txt | grep $str >> fail.txt
   fi
done

Which compares whole lines...

jerome_Sukumar · July 26, 2006, 7:44am

Hi,

As i have said its a one-one mapping,but the columns can be dynamic.
It can be col X(file1)-col 1(file2) ,colX,colY(file1)-col1col2(file2).Is it possible with the current script.

vish_indian · July 26, 2006, 8:15am

while read line;do
first=`echo $line | cut -d ";" -f1`
third=`echo $line | cut -d ";" -f3`
while read var; do
result=`echo $var | awk -F";" -v first=${first} -v third=${third} '{if($1~first && $3~third) print 1; else print 0}'`
if [[ result -eq 1 ]]; then
break
fi
done < file2

if [[ $result -eq 1 ]]; then
echo $line >> found
else
echo $line >> notfound
fi
done < file1

mukundranjan · July 26, 2006, 8:16am

Hi, Sukumar
This is simple solution for your problem.

#!/bin/bash

while read var1
do
colf1=` echo $var1 | cut -d";" -f 3,4`
grep $colf1 file2 >> sucessfile1
done < file1
grep -vf sucessfile1 file1 >> failfile1