How to compare two columns in two files?

Hello all,

Could someone please let me know shell script or awk solution to compare two columns in two files? Here is the sample -

file1.txt

abc/xyz,M1234
ddd/lyg,M2345
cnn/tnt,G0123

file2.txt

A,abc/xyz,kk,dd,zz,DCT,G0123,1
A,ddd/lyg,kk,dd,zz,DCT,M1234,1
A,abc/xyz,kk,dd,zz,DCT,M1234,1
A,cnn/tnt,kk,dd,zz,DCT,G0123,1
A,ddd/lyg,kk,dd,zz,DCT,G0123,1

In above files, I would like to compare files 1 and 2 in file1.txt with fields 2 and 7 in file2.txt. Only if both fields matches, I would like to print/save file2.txt line into new file. So, in above comparison I would like to print out following -

file3.txt

A,abc/xyz,kk,dd,zz,DCT,M1234,1
A,cnn/tnt,kk,dd,zz,DCT,G0123,1

Could someone please help me with script for this?

Thank you for your help! :slight_smile:

awk will do...

awk -F"," 'FILENAME=="file1"{A[$1$2]=$1$2}
FILENAME=="file2"{if(A[$1$2]==$2$7){print}}' file1 file2

this doesn't generate any output :frowning:

---------- Post updated at 06:17 PM ---------- Previous update was at 06:10 PM ----------

Please help!!

sorry I made a small typo error :stuck_out_tongue:
this will work...

awk -F"\," 'FILENAME=="file1"{A[$1$2]=$1$2}
FILENAME=="file2"{if(A[$2$7]){print}}' file1 file2
nawk -F"," 'NR==FNR{_[$1]=$2} NR!=FNR{if($7==_[$2]) print;}' file1 file2

Thank you folks for your help.

vidyadhar85 - Your solution worked perfectly. I went ahead with your solution.

summer_cherry - Thank you for your help too. Somehow the file would not print the first match in test I ran (it would print other matches).

Thanks

vidyadhar85, can you please extend your code to print the entire lines containing the matching columns to a new file (file3) separated by "->"?

for example using above:

file1:

homer simpson age old_address occupation
marge simpson age old_address occupation
bart simpson old_address occupation
lisa simpson old_address occupation
maggie simpson old_address occupation

file2:

LISA SIMPSON new_address occupation
maggie simpson new_address occupation
GRANPA simpson age new_address occupation

output (file3):

lisa simpson old_address occupation -> LISA SIMPSON new_address occupation
maggie simpson old_address occupation -> maggie simpson new_address occupation

Thanks

Base on your original data sample

awk -F, 'NR==FNR{a[$0]=$0;next}a[$2FS$7]{print a[$2FS$7],"->",$0}' file1 file2

Thanks danmero, but my request is slightly different.

I want to compare file1 and file2 based on their first 2 columns (case insensitive). If the 2 columns match then print their lines to file3 as shown below.

(input) file1:

homer simpson age old_address occupation
marge simpson age old_address occupation
bart simpson old_address occupation
lisa simpson old_address occupation
maggie simpson old_address occupation

input (file2):

LISA SIMPSON new_address occupation
maggie simpson new_address occupation
GRANPA simpson age new_address occupation

output (file3):

lisa simpson old_address occupation -> LISA SIMPSON new_address occupation
maggie simpson old_address occupation -> maggie simpson new_address occupation

Thanks

# awk 'NR==FNR{a[$1FS$2]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0}' file1 file2

Many thanks danmero.

It works perfectly.

---------- Post updated 09-30-09 at 02:27 AM ---------- Previous update was 09-29-09 at 01:33 PM ----------

My final request danmero:

I tried to send the output to a file (say file3) by amending your code as follows:

awk 'NR==FNR{a[$1FS$2]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0}' file1 file2 >> file3

but it ends up joining the lines. Don't know how to include the newline character.

Thanks

Works for me :cool:

# cat file1
homer simpson age old_address occupation
marge simpson age old_address occupation
bart simpson old_address occupation
lisa simpson old_address occupation
maggie simpson old_address occupation
# cat file2
LISA SIMPSON new_address occupation
maggie simpson new_address occupation
GRANPA simpson age new_address occupation
# awk 'NR==FNR{a[$1FS$2]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0}' file1 file2 > file3
# cat file3
lisa simpson old_address occupation -> LISA SIMPSON new_address occupation
maggie simpson old_address occupation -> maggie simpson new_address occupation

This is my code:

#/!bin/bash

NR==FNR{a[$1FS$2]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0}' file1 file2 >file3

I previously tried it in Cygwin, which gave an output on a single line.

But on a Mac I get the following error:

line 3: syntax error near unexpected token `$1FS$2'
./match.sh: line 3: `NR==FNR{a[$1FS$2]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0}' file1 file2 >file3'

Any ideas?

Thanks

---------- Post updated at 10:29 AM ---------- Previous update was at 10:24 AM ----------

danmero

Please ignore my previous post. Your code works fine.

I typed the code incorrectly. My bad.

Thanks

Hi danmero

I applied your code :

awk 'NR==FNR{a[$1FS$2]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0}' file1 file2

to the following input files and there's no output even though there are matching items:

file1:

Cohen	Philip 	28 Johnson Street
Cohen	Hanna	28 Johnson Street
Cohen	Sarah	28 Johnson Street
Cohen	Solomon	28 Johnson Street
Cohen	Jacob	72 Stock Street
Cohen	Flora	72 Stock Street
Cohen	Charlotte	4 Back Stock Street
Cohen	Jane	4 Back Stock Street
Cohen	Joel	4 Back Stock Street
Cohen	Mark	4 Back Stock Street
Cohen	Caroline	6 Clarrissa Place
Cohen	Henry	6 Clarrissa Place
Cohen	Joseph	6 Clarrissa Place
Cohen	Hyman	12 Winter Street
Cohen	Zipora	12 Winter Street
Cohen	David	12 Winter Street
Cohen	Moses	12 Winter Street
Cohen	Dora	12 Winter Street
Cohen	Emily	12 Winter Street
Cohen	Mark	 Winter Street
Cohen	Emma	 Winter Street
Cohen	Solly	 Winter Street
Cohen	Joel	 Winter Street
Cohen	Esther	 Winter Street
Cohen	Mary Ann	 Winter Street
Cohen	Jacob	 Winter Street
Cohen	Fushel 	14 Fernie Street
Cohen	Maria	14 Fernie Street
Cohen	Annie	14 Fernie Street
Cohen	Betsy	14 Fernie Street
Cohen	Sarah	14 Fernie Street
Cohen	Samuel	16 Fernie Street
Cohen	Maria	16 Fernie Street
Cohen	Annie	16 Fernie Street
Cohen	Fanny	16 Fernie Street
Cohen	Harris	28 Fernie Street
Cohen	Jane	28 Fernie Street
Cohen	Henry	28 Fernie Street
Cohen	Yetta	28 Fernie Street
Cohen	Levi	28 Fernie Street
Cohen	Gotlip 	25 Fernie Street
Cohen	Gotleib	25 Fernie Street
Cohen	Dorah	25 Fernie Street
Cohen	Yetty	25 Fernie Street
Cohen	Harris	10 Verdon Street
Cohen	Mary	10 Verdon Street
Cohen	Eliza	10 Verdon Street
Cohen	Samuel	22 Verdon Street
Cohen	Hyman	27 Verdon Street
Cohen	Ann 	27 Verdon Street
Cohen	Ann	27 Verdon Street
Cohen	Billy	27 Verdon Street
Cohen	Ernest	27 Verdon Street
Cohen	Louis	27 Verdon Street
Cohen	Mark	27 Verdon Street
Cohen	Esther	27 Verdon Street
Cohen	Wolf	27 Verdon Street
Cohen	Hannah	27 Verdon Street
Cohen	Louis	27 Verdon Street
Cohen	David 	7 Back Verdon Street
Cohen	Siggly	7 Back Verdon Street
Cohen	Abraham	7 Back Verdon Street
Cohen	Lewis	9 Back Verdon Street
Cohen	Hetty	9 Back Verdon Street
Cohen	Rachel	9 Back Verdon Street
Cohen	Ernest	9 Back Verdon Street
Cohen	George	10 Back Scotland
Cohen	Levi	22 Scotland Cellar
Cohen	Sarah	22 Scotland Cellar
Cohen	David	22 Scotland Cellar
Cohen	Rachel	22 Scotland Cellar
Cohen	Ellen	22 Scotland Cellar
Cohen	Rosa	22 Scotland Cellar
Cohen	Sarah	22 Scotland Cellar
Cohen	Esther	22 Scotland Cellar
Cohen	Louis	40 Fernie Street
Cohen	Sarah	40 Fernie Street
Cohen	Isaac	40 Fernie Street
Cohen	Esther	40 Fernie Street
Cohen	Isaac	42 Fernie Street
Cohen	Sarah	42 Fernie Street
Cohen	Hanna	42 Fernie Street
Cohen	Jacob	42 Fernie Street
Cohen	Rachel	42 Fernie Street
Cohen	Rebecca	42 Fernie Street
Cohen	Barnett	42 Fernie Street
Cohen	Leana	42 Fernie Street
Cohen	Janett	42 Fernie Street
Cohen	Solomon	44 Fernie Street
Cohen	Sarah	44 Fernie Street
Cohen	Barnet	51 Fernie Street
Cohen	Rebecca	51 Fernie Street
Cohen	Lewis	55 Fernie Street
Cohen	Simon	40 Moreton Street
Cohen	Meyer	40 Moreton Street
Cohen	Michael	16 Julia Street
Cohen	Aaron	3 George Street
Cohen	Hannah	3 George Street
Cohen	Woolf	3 George Street
Cohen	Morris	3 George Street
Cohen	Raphael	3 George Street
Cohen	Emila	3 George Street
Cohen	Rachel	3 George Street
Cohen	Philip	83 Berkley Street
Cohen	Flora	83 Berkley Street
Cohen	George	83 Berkley Street
Cohen	Rosa	83 Berkley Street
Cohen	Alfred	83 Berkley Street
Cohen	Sarah	83 Berkley Street
Cohen	Sarah	83 Berkley Street
Cohen	Abraham	87 Berkley Street
Cohen	Augusta	87 Berkley Street
Cohen	Esther	87 Berkley Street
Cohen	Charles	87 Berkley Street
Cohen	Joshua	11 Moreton Street
Cohen	Joshua	13 Moreton Street
Cohen	Ezra	 Boarding house
Cohen	Michael	23 Irwell Street
Cohen	Minnie	23 Irwell Street
Cohen	Sarah	23 Irwell Street
Cohen	Rebecca	23 Irwell Street
Cohen	Eugen	27 Whitfield Street
Cohen	Elija J	36 Clarance Street
Cohen	Rebecca	36 Clarance Street
Cohen	Simon J	27 Broughton Street
Cohen	Ina	27 Broughton Street
Cohen	Fanny	27 Broughton Street
Cohen	Polly	27 Broughton Street
Cohen	Isaac	27 Broughton Street
Cohen	Levy A	44 Elizabeth Street
Cohen	Ledreia	44 Elizabeth Street
Cohen	Moses	44 Elizabeth Street
Cohen	Amelia	44 Elizabeth Street
Cohen	Mindlla	6 Gt. Cheetham St.
Cohen	Jacob	55 Dantzig Street
Cohen	Betsy	55 Dantzig Street
Cohen	Hyman	66 Portland Street
Cohen	Fany	66 Portland Street
Cohen	Betsy	66 Portland Street
Cohen	Joseph	66 Portland Street
Cohen	David	66 Portland Street
Cohen	John	2 Hulmes Court
Cohen	Mary	2 Hulmes Court
Cohen	Eliza	2 Hulmes Court
Cohen	Mary A	2 Hulmes Court
Cohen	Margret	2 Hulmes Court
Cohen	Sarah J	2 Hulmes Court
Cohen	Catherine	2 Hulmes Court
Cohen	Marks	8 Oswald Street
Cohen	Mary	8 Oswald Street
Cohen	Lewis	231 Deansgate
Cohen	Sarah	231 Deansgate
Cohen	Marks	231 Deansgate
Cohen	Fanny	231 Deansgate
Cohen	Jacob	231 Deansgate
Cohen 	Alex	22 York Street
Cohen 	Myer	22 York Street
Cohen 	Esther	22 York Street
Cohen 	Sarah	22 York Street
Cohen 	Purcilla	150 York Street
Cohen 	Esther	150 York Street
Cohen 	Eliezer	150 York Street
Cohen 	Maria	150 York Street
Cohen 	Edith	150 York Street
Cohen 	Marcella	150 York Street
Cohen 	Harry 	67 North Street
Cohen 	Hannah	67 North Street
Cohen 	Bertha	67 North Street
Cohen 	Jacob	67 North Street
Cohen 	Morris	67 North Street

file2

COHEN	Miriam	6 Johnson St
COHEN	Rachel	6 Johnson St
COHEN	Hyman	25 Clarence St
COHEN	Benjamin	6 Johnson St
COHEN	Solomon	6 Johnson St
COHEN	Hannah	6 Johnson St
COHEN	Moses	28 Lord St
COHEN	Solomon	30 Julia St
COHEN	Nathan	10 Foulkes St
COHEN	Jane	82 Moreton St
COHEN	Hannah	99 Moreton St
COHEN	Hannah	30 Julia St
COHEN	Rachel	10 Foulkes St
COHEN	Isaac	25 Johnson St
COHEN	Jacob	251 York
COHEN	Jacob	82 Moreton St
COHEN	Arron	99 Moreton St
COHEN	Mondel	11 Cheetwood St
COHEN	Israel	62 Moreton St
COHEN	Mary	62 Moreton St
COHEN	Abraham	62 Moreton St
COHEN	Hana	42 Fernie St
COHEN	Rebecca	29 Mary St
COHEN	Fanny	29 Mary St
COHEN	Abraham	29 Mary St
COHEN	Annie	27 Berkeley St
COHEN	Rebecca	12 Foulkes St
COHEN	Ester	11 Cheetwood St
COHEN	Rachel	42 Fernie St
COHEN	Bertha	29 Mary St
COHEN	Sarah	11 Cheetwood St
COHEN	Leah	11 Cheetwood St
COHEN	Morris	42 Fernie St
COHEN	Heron	29 Mary St
COHEN	Lewis	29 Mary St
COHEN	Micheal	27 Berkeley St
COHEN	Simon	11 Cheetwood St
COHEN	Rachael	11 Cheetwood St
COHEN	David	12 Foulkes St
COHEN	Soloman	23 Berkeley St
COHEN	Harris	23 Berkeley St
COHEN	Yetta	23 Berkeley St
COHEN	Jacob	54 Fernie St
COHEN	Arron	136 Moreton St
COHEN	Rachael	128 Moreton St
COHEN	Mary	76 Moreton St
COHEN	Sarah	80 Moreton St
COHEN	Elliss	128 Moreton St
COHEN	Kalman	107 Moreton St
COHEN	Sarah	107 Moreton St
COHEN	Mark	76 Moreton St
COHEN	Morris	9 Nightingale St
COHEN	Abraham	107 Moreton St
COHEN	Solomon	40 Carter St
COHEN	Maria	15 Francis Street
COHEN	Samuel	26 Pimblett St
COHEN	David	16 Pimblett Street
COHEN	Fagy	16 Pimblett Street
COHEN	Esther	26 Pimblett St
COHEN	Glaty	16 Pimblett Street
COHEN	Morris	15 Drake St
COHEN	Wolf	15 Dewhurst St
COHEN	Goodman	15 Dewhurst St
COHEN	Lear	15 Dewhurst St
COHEN	David	40 Lord St
COHEN	Freedman	40 Lord St
COHEN	Hannah	40 Lord St
COHEN	Sarah J.	40 Lord St
COHEN	Betsy	40 Lord St
COHEN	Joseph	40 Lord St
COHEN	Flora	15 Broughton St
COHEN	Joseph	53 Fernie St
COHEN	Rachael	7 Augustus St
COHEN	Betsy	2 Foulkes St
COHEN	Lewis L.	38 (& 40) Fernie St
COHEN	Wolf	34 Fernie St
COHEN	Aaron	86 Pimblett St
COHEN	Annie	34 Fernie St
COHEN	Isaac	38 (& 40) Fernie St
COHEN	Esther	54 Fernie St
COHEN	Abraham	39 Fernie St
COHEN	Morris T.	34 Fernie St
COHEN	Rachel	52 Fernie St
COHEN	Rose	36 Fernie St
COHEN	Sarah	38 (& 40) Fernie St
COHEN	Jacob	15 Broughton St
COHEN	Harris	7 Augustus St
COHEN	Zetta	53 Fernie St
COHEN	Dorah	53 Fernie St
COHEN	Esther	39 Fernie St
COHEN	Susman	40 Lord St

Just can't figure out what the problem is.

Please help.

Thanks

 awk 'NR==FNR{a[tolower($1FS$2)]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0}' file1 file2

The input files were originally Excel files which were saved as tab delimited text files.

I think that may be the source of the problem.

Ghetz

Hi danmero,
Thanks for your latest code - it works fine.

I have been playing round with it to give the output in double quotes with a semicolon but to no avail.

awk 'NR==FNR{a[tolower($1FS$2)]=$0;next}a[tolower($1FS$2)]{print a[tolower($1FS$2)],"->",$0";"}' file1 file2

For example I want the output to be like this:

"Cohen	Hannah	99 Moreton St" -> "COHEN	Hannah	30 Julia St";
 

Instead of this:

Cohen	Hannah	99 Moreton St -> COHEN	Hannah	30 Julia St
 

I have got the semicolon but struggling with the double quotes?

I could once again do with your help.

Thanks.

After some research and trial an error, this seems to produce the required output:

awk 'NR==FNR{a[tolower($1FS$2)]=$0;next}a[tolower($1FS$2)]{print "\42", a[tolower($1FS$2)],"\42","->","\42",$0,"\42" ";"}' file1 file2