Easy unix/sed question that I could have done 10 years ago!

Hi all and greetings from Ireland!

I have not used much unix or awk/sed in years and have forgotten a lot.
Easy enough query tho.

I am cleansing/fixing 10,000 postal addresses using global replacements.
I have 2 pipe delimited files , one is basically a spell checker for geographical areas. The second file is actual addresses.

Sample file 1 - 100+ lines (basically a spell checker):

|Irlllland|Ireland|
|Dubblin|Dublin|
|Corrk|Cork|
etc..

Sample file 2 - 10,000+ lines (Addresses to be cleansed):

|10 Main Street Irlllland|
|11 High Road Irlllland|
|1 High Road, Corrk|

The output required is :

|10 Main Street Ireland|
|11 High Road Ireland|
|1 High Road, Cork|

I am very rusty but reckon I need a loop with a global substition in it.
I used to know unix, awk and sed reasonably well but have forgotten the basic syntax.

All helpers there?

What about this approch in sed?

  1. Making a pattern file.
sed -e 's!|!/!g' -e 's/^/s&/' file1 >sed_pattern_file
  1. Using the pattern file to do replacement in file2
sed -f sed_pattern_file file2

Output:

And the one in awk:

awk 'BEGIN{ FS="|"; i=1; while((getline < "file1") > 0) { arr=$2; arr_val[i++]=$3; } } { for (j=1;j<i;j++) { gsub(arr[j],arr_val[j],$0); } print; }' file2

Another approach with awk:

awk 'BEGIN{FS="[ |]"} 
NR==FNR{a[$2]=$3;next}
$5 in a {$5=a[$5]}
{print}' file1 file2

If you get errors use nawk, gawk or /usr/xpg4/bin/awk on Solaris.

I think I may have confused the issue for the last post. (franklin52)

The $5 was confusing me!

I deliberately spelt Ireland incorrectly to demonstrate the requirement.

Unfortunately I chose the letter "L" (in lower case) to demonstrate the mispelling. A lower case "L" looks the same as the pipe symbol.

Presumably the elegant last post should be adjusted to reflect the letter "L" issue.

Incidentally, I will study the solutions provided in more detail.
The code provided made me realise how much I used to love playing with "awk" and also how much a few lines of code can achieve.