sed script to generate hyperlinks refuses to work

Hi All,

I'm new to the forum and not a programmer, but I'm writing a bash script to preprocess definitions of technical terms by inserting hyperlinks pointing to other pages in the glossary before the pages are posted to our server, using a standard naming convention for the pages. The script searches through a set of text (all the definitions) and insert hyperlinks when it finds specific terms (the terms). The script also generates a cleaned-up version of the terms for the hyperlink, getting rid of uppercase, non-valid characters for filenames, etc. So the idea is to replace the term in the original file by the term plus a hyperlink to the page for that term.

The individual comands used to work (I swear) when I first wrote it, but in refining it I've busted it and struggled for two days now trying to get it to work, including step by step, or as a script...can't figure out how I've broken it. I hoping someone wil spot the error, grateful for any help you can provide, it doesn't have to be efficient, just work.

Symptom is only the last hyperlink in the terms in included, instead of all of them

Here's the (non-working) script as is, mainly just search and replace, grateful for any help you can provide....

#!/bin/bash
sh --version
debug=":"
debug="echo"
SEQ=/usr/bin/seq
#tr -s '\n' < definitions.txt > temp1.txt
#tr -s ' ' < temp1.txt > definitions.txt
sed -i "s/\./yyyyy/g" definitions.txt
sed -i "s/(/ ( /g" definitions.txt
sed -i "s/\//zzzzz/g" definitions.txt
#sed -i "s/(/( /g" definitions.txt
sed -i "s/,/ , /g" definitions.txt
sed -i "s/)/ ) /g" definitions.txt
tr ' ' '' < definitions.txt > definitions_underscore
tr -s '\n' < terms > temp2
tr -s ' ' < temp2 > terms
tr A-Z a-z < terms > terms_lowercase
tr -d ' =;:`"<>,./?!@#$%^&(){}[]+~-' < terms_lowercase > terms_url
sed -i "s/\./yyyyy/g" terms
sed -i "s/\//zzzzz/g" terms
tr ' ' '
' < terms > terms_search
a=( $( cat terms_search ) )
b=( $( cat terms_url ) )
$debug " Number of elements in array is $(( ${#a[@]} ))"
for i in $($SEQ 0 $((${#a[@]} - 1)))
do
echo ${a[$i]}
echo ${b[$i]}
sed -i "s/_"${a[$i]}"_/\_\<a\href="\"${b[$i]}"\.php\"\>"${a[$i]}"\<\/a\> /g" definitions_underscore
done
sed -i "s/__(/(/g" definitions_underscore.txt
sed -i "s/
,/,/g" definitions_underscore.txt
sed -i "s/__)/)/g" definitions_underscore.txt
sed -i "s/yyyyy/\./g" definitions_underscore.txt
sed -i "s/zzzzz/\//g" definitions_underscore.txt
sed -i "s/yyyyy/\./g" terms.txt
sed -i "s/
\./\./g" terms.txt
sed -i "s/ \./\./g" terms.txt
tr '_' ' ' < definitions_underscore > definitions_linked.html

It's difficult to debug without seeing some sample input data and the expected output data for that input.

I know you're not worried about efficiency, but you could buy a lot of speed improvement and brevity by simply combining your sed commands into one script, e.g. replace:

sed -i "s/__(__/(/g" definitions_underscore.txt
sed -i "s/__,/,/g" definitions_underscore.txt
sed -i "s/__)/)/g" definitions_underscore.txt
sed -i "s/yyyyy/\./g" definitions_underscore.txt
sed -i "s/zzzzz/\//g" definitions_underscore.txt

with this:

sed -i "
  s/__(__/(/g
  s/__,/,/g
  s/__)/)/g
  s/yyyyy/\./g
  s/zzzzz/\//g
" definitions_underscore.txt

Are definitions_underscore and definitions_underscore.txt really supposed to be two different files?

Hey, thanks for the reply, I've cut this script down to basics (so not worrying about tidying up the output file, or missing occasional terms with special characters for the minute) But...still doesn't work. The terms and definitions are taken from an excel spreadsheet for processing, just cut and paste into text files, stored as unformatted.

I've witten some text data for the script which is procesed OK, but the real data isn't, the hyperlinks are just not added to the output file.

Have attached the stripped down script, the test data files and their output (working) and a chunk of the real data (not working). Grateful for any insight, just cannot see where's its going wrong!

Here's the same script cut and paste

#!/bin/bash
sh --version
debug=":"
debug="echo"
SEQ=/usr/bin/seq
tr ' ' '_' < terms > terms_search
tr A-Z a-z < terms_search > temp1
tr -d ' _=;:`"<>,./?!@#$%^&(){}[]�+~-' < temp1 > terms_url
tr ' ' '_' < definitions > definitions_underscore
a=( $( cat terms_search ) )
b=( $( cat terms_url ) )
$debug " Number of elements in array is $(( ${#a[@]} ))"
for i in $($SEQ 0 $((${#a[@]} - 1)))
do
  echo ${a[$i]}
  echo ${b[$i]}
  sed -i "s/_"${a[$i]}"_/\_\<a\_href="\"${b[$i]}"\.php\"\>"${a[$i]}"\<\/a\> /g" definitions_underscore
done

I found when testing with your live data that it encountered a sed script error when it attempted this substitution:

EAP/AKA_Authentication eapakaauthentication
sed: Function s/_EAP/AKA_Authentication_/\_\<a\_href="eapakaauthentication\.php"\>EAP/AKA_Authentication\<\/a\> /g cannot be parsed.

I fixed that by changing these lines:

tr ' ' '_' < terms | sed 's?/?\\/?g' > terms_search
tr A-Z a-z < terms_search | tr -d ' _=;:`"<>,./\\?!@#$%^&(){}[]£+~-' > terms_url 

Consider adding set -o errexit at the beginning of the script so that it stops if such errors are encountered, otherwise you can easily miss them.