Search and replace in text file

Hi,

I have gigabytes of text files that I need to search for "&" and replace with "&amp". Is there a way to do this efficiently (like sed command)?

Hope you could help.

Thanks.

give a try to

sed 's/&/&amp/g' infile

or

sed 's/[&]/&amp/g' infile

Are there any other uses of the & in the file? If so, we'll need something a bit more elaborate. Also, you probably need:

sed 's/&/&/g' infile >outfile

Ahem, ....

The "&" is a special character in sed regexps and means "the matched part completely here". For instance:

echo "huhu" | sed 's/hu/+&-/g'   # will result in "+hu-+hu-"

You will have to escape the "&":

sed 's/\&/\&/g' infile >outfile

I hope this helps.

bakunin

@bakunin

Yes, but if not escaped, it refers to the literal '&' previously matched so in fact it gives the same result :smiley:

... but ok, i guess you pointed it out for education purpose...

1 Like

& is only special in the replacement text. It is an ordinary character in the regular expression. In the regular expression, the sequence \& yields undefined behavior. It will probably work as intended with most implementations, but it's not required to.

Regards,
Alister

I do.

vi ./file

:/%s/&/&amp/g

:wq!

Thanks guys! I was able to use this (placed in a script).

cat $INPUT/$file|sed -e"s/&/&/g" > temp

probably not the best for a gig+ sized file...

Because doing cat of a gig+ file will take a long time or may hand the terminal. Please avoid the cat
try this

sed -e"s/&/&/g"  input_file > output_file

hope it will work

hi,
use this:
vi filename
esc
:%s/&/&amp/
wq

Hi,
well i have a question suppose i have file called testdata.dat,i want to replace the word Mr. wid Miss and write the changes to the same file i.e testdata.dat,what will be syntax?

sed 's/Mr./Miss./g' testdata.dat > testdata.dat

would the above command work?

If the -i option is available on your sed version, just :

sed -i 's/Mr\./Miss./g' testdata

Note the backslash before the first dot :
If you don't put the backslash, it will match any character (dot has a special meaning in regular expression)
for example in the case someone is called "Mr.Mrimba"
see what will happen with or without the backslash:

$ echo "Mr.Mrimba" | sed 's/Mr./Miss./g'
Miss.Miss.mba
$ echo "Mr.Mrimba" | sed 's/Mr\./Miss./g'
Miss.Mrimba

---------- Post updated at 01:29 PM ---------- Previous update was at 01:24 PM ----------

So if the backslash is mandatory if you want to match the literal dot, otherwise it will be interpreted as a regular expression which may lead to unexpected parsing.

This will also work

#sed -i 's/&/&amp/' file_name

Also can be done using vim.

for file in $(grep -l "PATTERN" `find . -name "*.php"`); do vim "+%s:PATTERN:REPLACE:g" "+wq" $file; done;

It can also be accomplished easily with :

perl -p -i -e 's/&/&amp/g' infile

@ubuntu:/tmp$ cat test
100
100
100
100

perl -pi -e 's/100/200/g' test

@ubuntu:/tmp$ cat test
200
200
200
200

NO!

Never ever use the same file at both sides of a pipe or redirection. It will be corrupted since process in concurrent. It does not finish and then pass the results as DOS did.

But yes, this is the usage of s/a/b/g construct.