How to compare two column using awk?

Hi team,

I have below sample file. It has 4 columns. I want awk script to compare column2 and column4 row by row and print result in new column5

If value matches then result should be MATCHED
if not then result should be NOT MATCHED

Input file as below

UDC_MSISDN,UDC_NPREFIX STG_MSISDN,STG_IMSI_NPREFIX
466000000000000,1545 466000000000000,820011902457295
466000000100,1545 466000000100,820011001002929
466000000101,1545 466000000101,820011001002927
466000000201,1545 466000000201,820011001002951
466000000300,1545 466000000300,820011001002950
466000000301,1545 466000000301,820011001002955
466500066972,1479 466500066972,1479
466500066977,1735 466500066977,1735
466500066980,1479 466500066980,1479
466500066981,1479 466500066981,1479
466500066987,1479 466500066987,1479
466500067002,1479 466500067002,1479
466500067006,1479 466500067006,1479
UDC_MSISDN,UDC_NPREFIX STG_MSISDN,STG_IMSI_NPREFIX
466000000000000,1545 466000000000000,820011902457295 NOTMATCHED
466000000100,1545 466000000100,820011001002929 NOTMATCHED
466000000101,1545 466000000101,820011001002927 NOTMATCHED
466000000201,1545 466000000201,820011001002951 NOTMATCHED
466000000300,1545 466000000300,820011001002950 NOTMATCHED
466000000301,1545 466000000301,820011001002955 NOTMATCHED
466500066972,1479 466500066972,1479 MATCHED
466500066977,1735 466500066977,1735 MATCHED
466500066980,1479 466500066980,1479 MATCHED
466500066981,1479 466500066981,1479 MATCHED
466500066987,1479 466500066987,1479 MATCHED
466500067002,1479 466500067002,1479 MATCHED
466500067006,1479 466500067006,1479 MATCHED

Thanks in advance

Shanul

Hello shanul karim,

I have a few to questions pose in response first:-

  • Is this homework/assignment? There are specific forums for these.
  • What have you tried so far?
  • What output/errors do you get?
  • What OS and version are you using?
  • What are your preferred tools, if not just awk? (C, shell, perl, awk, etc.)
  • What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.

Kind regards,
Robin

1 Like

hi rbatt,

I have tried this below code

awk '{if($2=$4) print $5="MATCHED" ; else print $5="NOTMATCHED"}' UDC_STG_COMP > UDC_STG_COMP1

But output is not as expected.

Coming to your question.

Is this homework/assignment? There are specific forums for these.

No homework. Actually i have two big files and i have parse them into one single file using below awk command. But now stuck at this point where i need to match these two column. its part of almost done assignment. with its solution this will done.
awk '
FNR==NR {a[$1] = $0
          next
         }

         {print $0, a[$1]?a[$1]:"NA"
         }
 ' FS=, STG3_DATA UDC_DATA > UDC_STG_COMP

What have you tried so far?

i have tried this below

awk '{if($2=$4) print $5="MATCHED" ; else print $5="NOTMATCHED"}' UDC_STG_COMP > UDC_STG_COMP1

What output/errors do you get?

The output is not as desired..

$ awk '{if($2=$4) print $5="MATCHED" ; else print $5="NOTMATCHED"}' UDC_STG_COMP > UDC_STG_COMP1

eshaqur@IN00106004 ~/MNP_PRE_CUTOVER_AUDIT
$ head -n 10 UDC_STG_COMP1
NOTMATCHED
NOTMATCHED
NOTMATCHED
NOTMATCHED
NOTMATCHED
NOTMATCHED
NOTMATCHED

What OS and version are you using?

Windows and trying this awk script in linux environment

What are your preferred tools, if not just awk? (C, shell, perl, awk, etc.)

awk will be good because i am learning same as well.. if not shell,perl will be ok.

thanks for your showing interest and helping me. Looking forward for solution if you can provide me. it will be really greatful

Regards,
Shanul

Try

awk -F"[ ,]" '{print $0, ($2==$4?_:"NOT") "MATCHED"}' file
UDC_MSISDN,UDC_NPREFIX STG_MSISDN,STG_IMSI_NPREFIX NOTMATCHED
466000000000000,1545 466000000000000,820011902457295 NOTMATCHED
466000000100,1545 466000000100,820011001002929 NOTMATCHED
466000000101,1545 466000000101,820011001002927 NOTMATCHED
466000000201,1545 466000000201,820011001002951 NOTMATCHED
466000000300,1545 466000000300,820011001002950 NOTMATCHED
466000000301,1545 466000000301,820011001002955 NOTMATCHED
466500066972,1479 466500066972,1479 MATCHED
466500066977,1735 466500066977,1735 MATCHED
466500066980,1479 466500066980,1479 MATCHED
.
.
.
1 Like

Really thanks a lot for this help.

Also, If you can explain this code. Then this learning will really help me in future.

awk -F"[ ,]" '                  # set the field separator to space OR comma, so you really have 4 fields 
{print $0,                      # print the line unmodified PLUS the OFS character
($2==$4?_:"NOT")                # ternary (or conditional) assignment: if the two fields are equal,
                                # don't print anything (empty undefined variable "_"), else print "NOT"
 "MATCHED"}                     # (unconditionally) print "MATCHED"
' file                          # file name to work upon
1 Like

Hello shanul karim,

Welcome to forums, I hope you will enjoy learning and sharing knowledge here. Could you please try following and let me know if this helps you.

awk -F, '{split($2,a," ");if(a[1]==$3){print $0,"Matched"} else {print $0,"Non-Matched"}}'   Input_file

Output will be as follows.

UDC_MSISDN,UDC_NPREFIX STG_MSISDN,STG_IMSI_NPREFIX Non-Matched
466000000000000,1545 466000000000000,820011902457295 Non-Matched
466000000100,1545 466000000100,820011001002929 Non-Matched
466000000101,1545 466000000101,820011001002927 Non-Matched
466000000201,1545 466000000201,820011001002951 Non-Matched
466000000300,1545 466000000300,820011001002950 Non-Matched
466000000301,1545 466000000301,820011001002955 Non-Matched
466500066972,1479 466500066972,1479 Matched
466500066977,1735 466500066977,1735 Matched
466500066980,1479 466500066980,1479 Matched
466500066981,1479 466500066981,1479 Matched
466500066987,1479 466500066987,1479 Matched
466500067002,1479 466500067002,1479 Matched
466500067006,1479 466500067006,1479 Matched
 

Thanks,
R. Singh

1 Like

Got it.. Very explanatory.

Thanks a lot again for this kind help

---------- Post updated at 11:14 AM ---------- Previous update was at 11:09 AM ----------

Hi Ravinder,

Thanks a lot for this. Its working fine.

Really grateful if you explain this code.

Hello shanul karim,

Could you please try following and let me know if this helps you.

awk -F, '          ##Setting field separator as comma.
{split($2,a," ");  ##splitting 2nd field to an array a whose separator is space.
if(a[1]==$3){      ##checking a condition here if array a 1st element and 3rd field are equal as per your request then do following.
print $0,"Matched"}##printing the current line and string Matched here.
else {             ##putting else here in case above if condition is not true cursor should come here and perform following.
print $0,"Non-Matched"##printing the current line and string Non-Matched here.
}
}'   Input_file    ##Mentioning the Input_file here.
 

Thanks,
R. Singh

1 Like

thanks a lot .. really helpful information