String compare

Hi Friends,

Can anyone help me with comparing the records in twofiles,

I have two files (csv)

FILE1:

1023,SMITH JAMES , (203) 789-1249
1023,HARRY POTTER , (213) 789-1249
1023,JONES D, (903) 789-1249

FILE1:

1023,SMITH ,2037891249
1023,HARRY , 2137891249
1023,JONES, 7037891249

it should return only one row i.e 1023,JONES, 7037891249 as they are different,It has to supress the "(" chareacters and blank ones.

Thanks in advance for your help.

S :slight_smile:

(203) 789-1249 and 2037891249,
I have to compare these

here's an idea to start with:

[root@localhost test]# echo "1023,SMITH , (203) 789-1249" |sed 's/[-() ]//g'
1023,SMITH,2037891249

All ( , ) , - and spaces are stripped. you can do this for both files and then compare them.

[root@localhost test]# sed -i 's/[-() ]//g' file
[root@localhost test]# sed -i 's/[-() ]//g' file2
[root@localhost test]# diff file file2
3c3
< 1023,JONES,9037891249
---
> 1023,JONES,7037891249

If I understand correctly:

awk 'NR==FNR{gsub(/[ \(\)-]/,"");x[$0];next}
{gsub(/ /,"")}!($0 in x)' file1 file2

Use nawk on Solaris.

Can you please explain your code.

Thanks in advance.
An awk student.

awk '
# If NR==FNR this is the first file, so get rid of 
#+ the "(",")","-"," " characters ("gsub" is global substitution),
#+ and populate the x array: x[$0].
NR==FNR{gsub(/[ \(\)-]/,"");x[$0];next}
# Otherwise, it's the second file, so 
#+ remove the spaces. Now we have
#+ the right formating. 
{gsub(/ /,"")}
# If the current record is not
#+ previously stored in the x array,
#+ print it (default action).
!($0 in x)' file1 file2

AnOTHER awk student :slight_smile:

Thank you all,

Small clarification on this:
How can we use sed on a perticular column (third column in this example),
sed 's/[-() ]//g' is processing all the columns.
I have two files to compare.

[root@localhost test]# echo "1023,SMITH , (203) 789-1249" |sed 's/[-() ]//g'
1023,SMITH,2037891249

Hi Randuolov,

will this command displays only the changed data can you please explain.
when I run this command it is displaying same file with the data.
Thanks

Because the input data I was reading while writing the script
was different (the post was modified;
ghostdog74's post is showing the original sample)
Try this:

awk 'NR==FNR{ gsub(/[ \(\)-][A-Z]*/,"");x[$0];next}
{gsub(/ /,"")}!($0 in x)' file1 file2

thank you very much

Hi Radoulov,

How to restrict the gsub to start from a certain position,
Can we use the substr in conjunction with gsub,

nawk 'NR==FNR{ gsub(/[ \(\)-]/,"");x[$0];next}
{gsub(/[ \(\-]/,"")}!($1 in x)' file11.csv file22.csv

Is substituting all the "-" to spaces as a result the
If the first column has the "-" it is overidden.

Is displays 1023,JONES D,7037891249 from the example.

FILE1:

1-023,SMITH JAMES, (203) 789-1249
10-23,HARRY POTTER, (213) 789-1249
1-023,JONES D, (903) 789-1249

FILE2:

1-023,SMITH JAMES,2037891249
10-23,HARRY POTTER, 2137891249
1-023,JONES D,7037891249

Output should be:

1-023,JONES D,7037891249

As the phone number is different.

Thanks a lot for your help

S :slight_smile:


$ cat file1
1-023,SMITH JAMES, (203) 789-1249
10-23,HARRY POTTER, (213) 789-1249
1-023,JONES D, (903) 789-1249

$ cat file2
1-023,SMITH JAMES,2037891249
10-23,HARRY POTTER, 2137891249
1-023,JONES D,7037891249

$ nawk 'NR==FNR{gsub(/[ \(\)-]/,"",$3);x[$0];next}
> {sub(/ /,"",$3)}!($0 in x)'  OFS="," FS="," file1 file2
1-023,JONES D,7037891249

It worked like a magic

Thanks a lot

can you please explain what you are doing in the "sub" to compare

sub is not there for comparisons, instead it substitutes the values in this way (/Matchpattern/SubstitutePattern/) like in this case sub(/ /,"",$3) it'll substitute any spaces in the third column with "" that means it'll remove spaces from third colum, in the same way gsub is functioning in this script, gsub(/[ \(\)-]/,"",$3) has match pattern /[\(\)-]/ ie match a ( or ) or - and replace it with "" null value means remove it, actual comparison is being done thru arrays and Radoulov has desdribed it earlier.

Regards,
Tayyab

Check the manual for the differences between sub and gsub.