Comparison of Cells in EXCEL using awk

Hi

I have 2 csv files which looks like the following and i have to compare the 2 CSVs and create a third file such that if the value of the 1st cell in A.CSV and the value of the first cell in the B.CSV are same, it should print "SAME" in the third file or else print NOT SAME. Likewise i need to create the third file with the SAME/NOT SAME corresponding to each cell in the 2 CSVs.

A.CSV
"1","2009/1/12","DATA",+10.00
"2","2009/1/12","DATA",+11.00
"3","2009/1/12","DATA",+12.00
"4","2009/1/12","DATA",+13.00
B.CSV
"1","2009/1/12","DATA",+99.00
"2","2009/1/12","DATA",+11.00
"3","2009/1/12","DATA",+12.00
"4","2009/1/12","DATA",+13.00

I used the code as below

awk -F',' 'FNR==NR {_[$4]=$0;next} 
                   { split(_[$4],ar);
                     for(i=1;i<=NF;i++) 
                        if(ar=$i) $i="SAME,"; 
                        else  $i="NOT SAME,"; 
                     print 
                   }' A.csv B.csv>> C.csv

but it always showed the following output. whether the data was same or not same.

OUTPUT
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,

but it is expected to show

SAME,SAME,SAME,NOT SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,

Thanks in advance for the help.
Regards
Meva
whereas the expected

Try this:

awk -F, '{
  getline name < "B.csv"
  split(name, a, ",")
  for(i=1;i<=NF;i++) {
    if(a==$i) {
      printf("SAME,")
    } 
    else {
      printf("NOT SAME,")
    }
  }
  print ""
}' A.csv > C.csv

Hi Franklin.....

This is working...Thanks a lot.....:slight_smile:

But we still have a problem...File A looks like the one given below

A.CSV
"1","2009/1/12","DATA",+10.00
"2","2009/1/12","DATA",+11.00
"3","2009/1/12","DATA",+12.00
"4","2009/1/12","DATA",+13.00

and file B looks like this

B.CSV
1,"2009/1/12","DATA",+99.00
2,"2009/1/12","DATA",+11.00
3,"2009/1/12","DATA",+12.00
4,"2009/1/12","DATA",+13.00

so as a result of excuting the code we get the following output

NOT SAME,SAME,SAME,NOT SAME,
NOT SAME,SAME,SAME,SAME,
NOT SAME,SAME,SAME,SAME,
NOT SAME,SAME,SAME,SAME,

But we expect the output to be

SAME,SAME,SAME,NOT SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,

Please help with your thoughts on how to get the expected result.

Thanks In Advance
Meva.

Try...

awk -F, '{gsub("\"","",$1)}
{
  getline name < "B.csv"
  split(name, a, ",")
  for(i=1;i<=NF;i++) {
    if(a==$i) {
      printf("SAME,")
    } 
    else {
      printf("NOT SAME,")
    }
  }
  print ""
}' A.csv

Still not working....

Sorry, didn't try that before but this one is working now...

awk -F, '{gsub("\"","",$1)}
{
  getline name < "B.csf"
  split(name, a, ",")
  for(i=1;i<=NF;i++) {
    if(a==$i) {
      printf("SAME,")
    } 
    else {
      printf("NOT SAME,")
    }
  }
  print ""
}' A.csv

Thanks a Lot....Its working:)

Meva

Hello Friends,

I need to find out how this code exactly works, please help me confirming or correcting my comments regarding to code above as im not really good at using getline in awk.

  1. Field seperator is "," and gsub strips double quotes from 1st column of A.csv
  2. "Getline" gets each line of B.csv and assigns them to variable "name"
  3. "Split" splits the variable "name" namely each line of B.csv,assign each field to array"a"
  4. Each field of lines of A.csv is compared to corresponding fields of lines of B.csv, then
    prints SAME or NOT SAME as a result.

Thanks in Advance,
Best regards

Completely correct! :slight_smile:

BTW: You can add a line after the getline statement to compare the whole lines first to make it faster if you have huge files.

If they are equal it isn't necessary to do the loop:

awk -F, '{
  gsub("\"","",$1)
  getline name < "B.csv"
  if(name == $0){print "SAME,SAME,SAME,SAME,"; next}
  split(name, a, ",")
  for(i=1;i<=NF;i++) {
    if(a==$i) {
      printf("SAME,")
    } 
    else {
      printf("NOT SAME,")
    }
  }
  print ""
}' A.csv

Thanks for the inputs:)

in perl:-

perl -e '
open(FH1,"< $ARGV[0]") or die ;
open(FH2,"< $ARGV[1]") or die ;
while ( ($f1=<FH1>) && ($f2=<FH2>) ) {
chomp ($f1,$f2) ;
@a=split/,/,$f1;
@b=split/,/,$f2 ;
$end=@a ;
$a[0] =~ s/\"//g;
for ($i=0;$i<$end;$i++) {
        printf "Not_Same," if($a[$i] ne $b[$i]);
        printf "Same," if ($a[$i] eq $b[$i]) ;
              } ;
  print "\n" ;
} ;
' A.csv B.csv

;);):wink:

Hi Guys,

Thanks for the inputs, i have one more requirement for this. Currently we are putting the contents of A.CSV,B.csv and C.csv to be put into a fourth CSV D.csv such that they were one below the other but the expected D.csv should look in the following format

Expected result

<Content of A.csv>   <Content of B.csv>    <Content of C.csv>

I want the data in the A.csv,B.csv and the C.csv to appear side by side in the fourth csv D.csv so that when we have a NOT SAME value we can easily uderstand which value in the A.csv and B.csv is NOT SAME.

Kindly help me

Thanks again
Meva

paste -d "\t" A.csv B.csv C.csv  > D.csv

thanks Eagle

     nawk -F, '
        {_[NR] = $0}
        NR != FNR {
                printf("<%s> <%s> <", $0, _[FNR] )
                gsub("\"","",$1)
                split(_[FNR],a)
                for ( i = 1 ; i <= NF ; i++ )
                        printf("%s%s", a == $i ? "SAME" : "NOT SAME", i == NF ? ">\n" : "," )   ## without trailing , 
        } ' B.CSV A.CSV

If you want the trailing , then use this as the final print statement: -

 printf("%s%s", a == $i ? "SAME," : "NOT SAME,", i == NF ? ">\n" : "" ) ## with trailing ,

Output: -

<"1","2009/1/12","DATA",+10.00> <1,"2009/1/12","DATA",+99.00> <SAME,SAME,SAME,NOT SAME,>
<"2","2009/1/12","DATA",+11.00> <2,"2009/1/12","DATA",+11.00> <SAME,SAME,SAME,SAME,>
<"3","2009/1/12","DATA",+12.00> <3,"2009/1/12","DATA",+12.00> <SAME,SAME,SAME,SAME,>
<"4","2009/1/12","DATA",+13.00> <4,"2009/1/12","DATA",+13.00> <SAME,SAME,SAME,SAME,>

Hi

Thanks for the response.We do not have nawk programming installed in our Server so unable to use the code.

we used the code

paste -d "\t" A.csv B.csv C.csv  > D.csv

But the output has a special character (looking like a box) also being printed such as following first line

<first line of A.csv> <(special char)first line of B.csv> <(special char)first line of C.csv>

the special character appears only for the first column of the B.csv and C.csv

Please help!!!

Thanks again.
Meva

I think your OS does not support "tab" character "\t". Simply try to change Delimeter, you can use only a space if its ok for you:

paste -d " " A.csv B.csv C.csv  > D.csv

In case you want to investigate; what is the output of the following command ?

head -1 D.csv | od -bc

tyler_durden

Did you try removing the n from nawk and seeing if it runs?

I know it's a couple of weeks old and that the original problem has been solved, but the revival of this thread brought it to my attention and i thought i'd toss one in anyway :slight_smile:

$ paste -d, a b | awk -F, '{for(i=1;i<=NF/2;i++) printf("%sSAME,",$i!=$(i+NF/2)?"NOT ":""); print ""}'
SAME,SAME,SAME,NOT SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,
SAME,SAME,SAME,SAME,

Cheers,
Alister