Comparing two files in UNIX and create a new file similar to equi join

I have 2 files namely branch.txt file & RXD.txt file as below

Ex:Branch.txt

   B1,Branchname1,city,country
   B2,Branchname2,city,country
   B3,Branchname3,city,country
   B4,Branchname4,city,country
   B5,Branchname5,city,country

RXD file : will contain old branch code, old accountid, new branch code and new Accountid.

RXD.txt

   B1,B1A1,WB1,WB1A1
   B1,B1A2,WB1,WB1A2
   B2,B2A1,WB2,WB2A1
   B2,B2A2,WB2,WB2A2
   B3,B3A1,WB3,WB3A1
   B1,B3A2,WB3,WB3A2

We need to compare old branch codes in the abv 2 files and create a new newBranchfilter.txt with new branch code (Similar to oracle equi join)

newBranchfilter.txt

   WB1,Branchname1,city,country
   WB2,Branchname2,city,country
   WB3,Branchname3,city,country

Im new to Unix shell scripting..can anyone help me to solve this using unix shell script.
Thank you!

You will find many examples in these forums for this kind of problem. In this case, you could try:

awk -F, 'NR==FNR{A[$1]=$3; next} $1 in A{$1=A[$1]}1' FS=, OFS=, file2 file1

which would produce:

WB1,Branchname1,city,country
WB2,Branchname2,city,country
WB3,Branchname3,city,country
B4,Branchname4,city,country
B5,Branchname5,city,country

or :

awk -F, 'NR==FNR{A[$1]=$3; next} $1 in A{$1=A[$1]; print}' FS=, OFS=, file2 file1

which would leave out the non-matched records:

WB1,Branchname1,city,country
WB2,Branchname2,city,country
WB3,Branchname3,city,country

--- Edit ---
There appears to be an error in the second sample file. I presume it should be:

B1,B1A1,WB1,WB1A1
B1,B1A2,WB1,WB1A2
B2,B2A1,WB2,WB2A1
B2,B2A2,WB2,WB2A2
B3,B3A1,WB3,WB3A1
B3,B3A2,WB3,WB3A2
1 Like

Hi Scrutinizer,

awk -F, 'NR==FNR{A[$1]=$3; next} $1 in A{$1=A[$1]; print}' FS=, OFS=, file2 file1

is working perfectly. Can you please explain the command and its working?

Thanks,
satece

Please use code tags as required by forum rules!

awk -F,                         # run awk and set field separator char
'                               # quote the entire script
NR==FNR {A[$1]=$3; next}        # for the first file, i.e. the file's line no. equals the overall line no., keep field 3 in array A indexed by field 1, stop processing this line, jump to next line
 $1 in A {$1=A[$1];             #  if field 1 points to an entry in A, replace it by that A entry's  contents
          print}                # and print this line
'                               # quote the entire script
FS=,                            # set field sep. again (pointless)
OFS=,                           # set output field sep.
file2                           #read and process file 2 first
file1                           # process file2
1 Like

Hi,

Similar to the above post, we are filtering the lines based on the 2 columns.

Input file1

a1,b1,1
a2,b2,2
a3,b4,3
a4,b3,4

Input file2

5,a1,b1
6,a2,b2
7,a3,b3
8,a4,b4

The requirement is to filter the records from the file1 when col1 and col2 of file1 matches with col2 and col3 of file 2 respectively.

when tried with the following awk command:

awk -F',' -v OFS=',' 'NR==FNR{a[$2FS$3]=$1;next} $1FS$2 in a{print $0} file2 file1

we got the output:

a1,b1,1
a2,b2,2
a3,b4,3

But the expected output is:

a1,b1,1
a2,b2,2

Please guide.

[akshay@localhost tmp]$ cat f1
a1,b1,1
a2,b2,2
a3,b4,3
a4,b3,4
[akshay@localhost tmp]$ cat f2
5,a1,b1
6,a2,b2
7,a3,b3
8,a4,b4
$ awk -F, 'FNR==NR{a[$2,$3];next}($1,$2) in a'  f2 f1
a1,b1,1
a2,b2,2

OR

[akshay@localhost tmp]$ awk -F, 'FNR==NR{a[$1,$2]=$0;next}($2,$3) in a{print a[$2,$3]}'   f1 f2
a1,b1,1
a2,b2,2

--edit--

awk -F, '
          # Here we read file1 (f1)
	  # FNR == NR only while reading first file

	  FNR==NR{
		   # We are interested in comparing
		   # Column1 and Column2 of file1 with
		   # Column2 and Column3 of file2, So
		   # here we create index Column1,Column2 ($1,$2)
                   # and its saved in array "a" where array value
		   # will be record of file1

		   a[$1,$2]=$0

		   # The next statement forces awk to 
		   # immediately stop processing the current record and go on to the next record

		   next
		 }

         # Here we read file2 (f2)
	 # again here we create index using column2 and column3 ($2,$3) of file2
	 # if index exists in array "a" we created above
	 # then print array value

     ($2,$3) in a{
		   print a[$2,$3]
		 }
       '   f1 f2

-F, -> Field separator is set to comma

Hi Akshay,

Thanks for the quick help. But facing the same problem.
Getting the output as:

a1,b1,1
a2,b2,2
a3,b4,3

Please suggest.

Hi Satece,

Actually your code in #5 looks fine, I just posted another way to do so, please do check your input file once again.

Hi Akshay,

Checked the code you posted, Surprisingly it is giving out the 3 rows as output. I am not sure how awk exactly works. Please suggest.

I also cannot reproduce the 3 line output. With the given sample, I tried with the code in #5 and with Akshay's :

awk -F, 'FNR==NR{A[$2,$3]=$0; next}($1,$2) in A' file2 file1

And both produced:

a1,b1,1
a2,b2,2

Are you sure your file is the same as the sample that you posted?
What version of awk are you using and what OS?

Hi Akshay, Scrutinizer,

I created the files again. It is working fine for the both awk statements. Thanks for the help. Do we need to have same number of records in both the files to use this command?

Thanks

Hi, Satece, no that is not necessary. The files do not need to have the same number of records.