1) I got a list_file intended to be used for inlace replacement like this
Replacement pattern ; Matching patterns
EXTRACT ___________________
toto ; tutu | tata | tonton | titi
bobo ; bibi | baba | bubu | bebe
etc. 14000 lines !!!
_____________________________
2) I got a target file in witch I want to replace thoses paterns
EXTRACT INPUT _______________
hello my name is bob and I am a Titi and I like bubu
_____________________________
I want it to become
EXTRACT OUTPUT ______________
hello my name is bob and I am a toto and I like bobo
_____________________________
Actually I am using awk to try to achieve this with this command :
awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A/,i)}1' simplifier_FR.txt text.txt
Sadly awk doesn't seems to understand the pipe � | � character as a OR indicator ... I have also tried to achieve this with sed but this option goes very slowly aven if it works
awk DOES understand a | character in a RE because it actually takes ERE, just like GNU sed with the -r option.
But a standard sed does NOT.
Your awk code has several bugs.
Is this homework/coursework?
I am trying to send a regex with pipes to do a
'pattern OR pattern OR ...'
with 'pattern | pattern | ...'
for example with one replacement :
echo 'toto; tutu | tata | tonton | titi ' | awk '{gsub(/ tutu | tata | tonton | titi /," toto ")}1'
gives
toto; toto | toto | toto | toto
with
awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A/,i)}1�
I expect to :
1 ) register an array A with $2 as content and $1 as key
so in the fist line
$2 =' tutu | tata | tonton | titi '
$1 = ' toto '
2 ) replace with gsub(/$2/,$1)}1
so in the fist line
awk 'IGNORECASE = 1 {gsub(/ tutu | tata | tonton | titi /," toto ")}1
actualy i am looking to -f option
Is that a good idea ?
I am thinking about doing
BEGIN
{replacing command 1}
{replacing command 2}
etc.
END
Yes, your idea with an ERE and pipe-OR works.
The main bug in your awk code is: the ERE is in / / (or in " ") when it is a constant. Not if it's in a variable!
Then, the input words have spaces around. How does it find the last word when there is no trailing space?
Then, you use the assignment IGNORECASE = 1 as a condition. Fortunately it is always true so the following { block } is run. Better have no condtion and set the variable once at the BEGINning!
Attempt to fix the bugs (untested)
awk -F';' 'BEGIN { IGNORECASE = 1 } NR==FNR { A[$1] = $2; next } { x = (" " $0 " "); for (i in A) gsub(A, i, x); sub(/^ /, "", x); sub(/ $/, "", x); print x }'
I don't know what you mean about the problem being the version of awk you were using when there were so many logic errors in your code. But, if you have it working now, congratulations.
Note, however, that in addition to the corrections MadeInGermany already listed, you also need to be absolutely sure that your first input file has exactly one <space> character before and after each word you're searching for as possible text to be replaced. For example, with the sample data you provided, no changes would be made to the following lined of text:
The word tonton in this text will not be changed to toto because there aren't
two <space> characters following any occurrence of tonton in this sentence, but
there is one <space> before tonton and two <space>s after tonton in your sample
simplifier_FR.txt file.
You might also want to note that if there are any punctuation characters before or after any of the words you want to replace, the code you're using won't find and/or replace them.