Compare columns of 2 files based on condition defined in a different file

newtoawk · November 6, 2010, 11:33pm

I have a control file which tells me which are the fields in the files I need to compare and based on the values I need to print the exact value if key =Y and output is Y , or if output is Y/N then I need to print only Y if it matches or N if it does not match and if output =N , then skip the feild to compare and write it to a output file
For ex:
my control file

key|compare_field|output
Y|Field_1|Y
N|Filed_2|Y/N
Y|Field_3|Y
N|Field_4|Y/N
N|Field_5|N
N|Field_6|Y/N
file1
field_1|feld_2|field_3|field_4|field_5|field_6
000|adbc|edfr|hjkl|890|jlk|ioy
678|jfjd|djla|uopp|678|jyh|jkl
file2
field_1|feld_2|field_3|field_4|field_5|field_6
000|adbc|edfr|hjkl|890|jlk|ioy
678|juio|djla|uopu|678|jyh|jkl
my output should be
field_1|feld_2|field_3|field_4|field_6
000|Y|edfr|Y|Y
678|N|djla|N|Y

I was trying to do it in 2 parts and then combine, but I am lost, need your help to combine this logic.

# to copy the field names as the header in the report file.
nawk -F\| 'END {print x } $NF =="Y" || $NF == "Y\/N" { printf "%s",$2 FS >> "report_file" }' control_file

To compare the 2 files and print the output as Y or N

nawk -F'|' '{ getline x <f; split(x,F,"|")}
NR >1 {for(i=2;i<=NF;i++) $i=(F==$i)?"Y";"N"}1' OFS="|" f=file2 file1

I can do then seperately, but I am not able to read the control file and compare the files based on the control file.

Please help me.
Thanks in Advance
newtoawk

pravin27 · November 7, 2010, 1:23am

Something like this,

awk -F'|' 'NR==FNR && NR>1 {a[++i]=$1$3;next} FNR>1 {if (b[FNR]) { c[FNR]=$0} else {b[FNR]=$0}} END {for(k in c) {split(c[k],d,"|");split(b[k],e,"|") ;for (j=1;j<=6;j++) {if (a[j]=="YY") {printf "%s|", d[j]} else if(a[j] != "NN") {printf "%s|" ,(d[j]==e[j])?"Y":"N"}}printf "\n"}}' control_file file1 file2

newtoawk · November 7, 2010, 4:51am

Thanks a lot Pravin, wish you happy deepavali to you. I ran the script and this is the output I got.

Y|adbc|Y|Y|Y|
Y|juio|Y|Y|Y|

Can you please explain me the code, so that I can make changes accordingly.
Thanks once again.

pravin27 · November 7, 2010, 5:44am

I hope this will help you.

awk -F'|' 'NR==FNR && NR>1 {a[++i]=$1$3;next}   #Read first file i.e. control_file starting from line 2 and fill the  array 'a' with value $1$3 i.e. Key and output field
            FNR>1 { if (b[FNR]) { c[FNR]=$0} else { b[FNR]=$0} } #Read file1 and file2 and fill the array 'b' for file1 and 'c' for file2
            END {
                for(k in c) {
                             split(c[k],d,"|");split(b[k],e,"|") ; # fill the array 'd' and 'e' by spilting record into fields of file1 and file2
                             for (j=1;j<=6;j++) {
                                                 if (a[j]=="YY") { #if key and output both are 'Y' then print the field as it is
                                                                  printf "%s|", d[j]
                                                                  } 
                                                 else if(a[j] != "NN") {  #if key and output both are not 'N'
                                                                        printf "%s|" ,(d[j]==e[j])?"Y":"N" #if field from file1 and file2 same then print 'Y' else 'N'
                                                                        }
                                                  }printf "\n"
                             }
                 }' control_file file1 file2

newtoawk · November 8, 2010, 7:50am

thanks a lot Pravin ..it works fine ...my compare fields would change ..so I can not hardcode the vaule in for (j=1;j<=6;j++).

I tried couple of things like for (j=1;j<=NF;j++) ..the result set had more fields.
I even tried this

nawk -F'|' 'num==NF;NR==FNR && NR>1 {a[++i]=$1$3;next}   
             FNR>1 { if (b[FNR]) { c[FNR]=$0} else { b[FNR]=$0} } 
             END {printf "\n"
                 for(k in c) {
                              split(c[k],d,"|");split(b[k],e,"|") ; 
                              for (j=1;j<=$num;j++) {
                                                  if (a[j]=="YY") { 
                                             
                                             printf "%s|", d[j]
                                                                   } 
                                                  else if(a[j] != "NN") {  
                                                                         printf "%s|" ,(d[j]==e[j])?"Y":"N" 
                                                                         }
                                                   }printf "\n" 
                              }
                  }' ctl_file file_1 file_2

can you please help me.

Thanks in Advacne
NewtoAwk

pravin27 · November 8, 2010, 11:27am

Hi,

You can use the below for loop, bcoz we are taking records in array 'a' with index 'i'

(j=1;j<=i;j++)

newtoawk · November 13, 2010, 10:26pm

thanks Pravin, it worked. Can I pass the field delimiter as a variable.B'cas I need to read the output format from a file.
for ex: instead of nawk -F'|' -- can I do it something like this
output_format=| or output_format=\t
nawk -F'$output_format' ... does this work, or is there anyother way to do it.

pravin27 · November 14, 2010, 12:19am

You can do like this,

awk -F"[|\t]" 'your code' filename