sed Very Slow

jimmyb · October 11, 2013, 5:35pm

Hi

We are using sed to clean up a file of a pattern and its talking a lot of time on XML output file

The command that we are using is

sed -e "s/tns1://g" $OUTPUTFILENM > $TEMPFILE

Where $OUTPUTFILENM is the file to be cleaned and $TEMPFILE is the cleaned output

Can you please help me to optimise this command or suggest some other command that can clean up say the string tns1: and replace it with space across all xml file.

I will really appreciate any suggestion. For a 45k record file its taking more than an hour in each record we have around 10 instances of string tns:

Also the string can appear any where its not fixed width location since we have XML file that we are cleanning

Thanks

jim_mcnamara · October 11, 2013, 10:26pm

Something is very wrong here. sed can process 10+ GB's/hour on a modern machine.
Please show the output of:

ls -l [one of your xml files]
wc -l [the same xml file]

jimmyb · October 11, 2013, 10:40pm

Here are the outputs of the above commands

-rw-rw-rw-    1 inf  inf    103876292 Oct 11 18:16
wc -l returns 0 as its a single root tag XML file

MadeInGermany · October 11, 2013, 11:48pm

That looks like one 103 MB long line.
sed is not optimized for this.
Try another sed version,
or perl

perl -pe "s/tns1://g" $OUTPUTFILENM > $TEMPFILE