Linux command to find and replace occurance of more than two equal sign with "==" from XML file.

RedRocks · June 21, 2012, 7:37am

Please help me, wasted hrs:wall:, to find this soulution:-
I need a command that will work on file (xml) and replace multiple occurrence (more than 2 times)

Examples

'==='
'===='
'======='

should be replaced by just '=='
Note :- single character should be replaced. (= should be replaced by ==)

I have used this command:-

sed 's/==*/==/g' example.xml > example.xml

But after using this command i found, where single character = is there, it changes into ==
Below are two for examples:-

Before:-

1) <namespace key="109" case="first-letter">Book talk</namespace>
2) st        = ''[[Sleeping Murder]]'' 
| cause       =

After:-

1) <namespace key=="109" case=="first-letter">Book talk</namespace>
2) st        == ''[[Sleeping Murder]]'' 
| cause       ==

Please help,
Regards,
Red.

bakunin · June 21, 2012, 7:58am

redrocks!!:

I need a command that will work on file (xml) and replace multiple occurrence (more than 2 times)

should be replaced by just '=='
Note :- single character should be replaced. (= should be replaced by ==)

I have used this command:-
sed 's/==*/==/g' example.xml > example.xml
But after using this command i found, where single character = is there, it changes into ==

Seems to me that this is what you wanted, isn't it?

If you only want to replace two or more occurences of "=" use:

sed 's/===*/==/g' example.xml > example.xml.result

You cannot direct the output of sed to the input file, because that would overwrite and destroy the input file after the first line. Therefore a different output file name.

I hope this helps.

bakunin

RedRocks · June 21, 2012, 8:12am

Is there any possibility, that after editing original file, and after process completes, can we replace original with output file, Becoz i have a file of 38 Gb.
I have managed shell script, as i am doing shell scripting first time, i am trying to create a script where file should be deleted after completing the process.

I want to extend command and delete original file and name the new file created same as old one.

Ex.

 sed 's/===*/==/g' inputfile.xml > outputfile.xml && COMMAND_FOR_DELETING_OLD_FILE && \
COMMAND_FOR_RENAMING_NEW_FILE_TO_SAME_AS_OLD_FILE

-Red.
(Becoz which i am working is 38 Gb and my Server dont have much resources, right command will help me.)

bakunin · June 23, 2012, 6:16am

There is no other way than what you have already found out: use a different file and then move the new file over the old file.

The reason for this i have described here and in some other postings too - i don't want to repeat it.

Btw., a suggestion: stay away from GNU-seds "-i" switch: it will just make your script less portable, but will do nothing else than do the "mv"-operation automatically afterwards.

I hope this helps.

bakunin

RedRocks · June 24, 2012, 3:50am

Hello,

Thank you for your suggestion. I have used, Below command, where it gives output in same file.

perl -i -pe's/===*/==/g' Example.xml

Once again thank you for helping

-Red

bakunin · June 24, 2012, 5:44pm

You seem to have misunderstood what i explained. "perl -i" does the same as "sed -i": it creates a temporary second file where it stores the changes and moves this over the original file as the last step.

You gain nothing by using perl (save for a considerably slower speed of execution, because native UNIX commands are way faster).

The point is simple: to edit a file you need to be able to store 2 versions of it.*) There is no way around that. The "-i" options of various tools just blur that fact by hiding this temporary file, but it is still necessary.

If you fear a long execution time for moving the file: don't. It is in fact just a change in the files i-node (which is a few bytes) as long as the temporary file and the original file are on the same filesystem. To execute

mv /path/to/fileA /path/to/fileB

takes the same time, regardless of the size of this file (as long as they both are on the same filesystem). So set your "TMP" or "TMPDIR" variable accordingly and have enough room on your disk - some 100GB should not really be a problem these times of multi-TB SAN storage fabrics.

I hope this helps.

bakunin

________________

*) actually this is not completely true, because there is a trick:

(cat /path/to/file) | sed '<somecommand>' > /path/to/file

This will work for files which are small enough to fit into memory. The downside is, that if anything goes wrong (power loss, reboot, process aborted, ...) your data will be irrevocably destroyed. You sure do not want to use this hack on critical data just to save a few GB of (temporary) diskspace.

alister · June 24, 2012, 6:53pm

bakunin:

... because there is a trick:
(cat /path/to/file) | sed '<somecommand>' > /path/to/file
This will work for files which are small enough to fit into memory. The downside is, that if anything goes wrong (power loss, reboot, process aborted, ...) your data will be irrevocably destroyed. You sure do not want to use this hack on critical data just to save a few GB of (temporary) diskspace.

That won't work even when there's just a single byte in the file, if the shell first creates the sed portion of the pipeline. There are no guarantees on which component of a pipeline will be created first.

Regards,
Alister

Corona688 · June 26, 2012, 11:33am

Rule of thumb is never edit your originals. If you destroy your data, you are screwed.

sixstrings · July 27, 2012, 3:15pm

I agree that you should never edit the originals. How ever you can inplace edit a file with sed:

sed -i 's/==*/==/g' file.txt (or what ever the file is called).

I use this a lot in post kickstart scripts (always with a copy of the original file created first.) It is really useful.

Corona688 · July 27, 2012, 4:03pm

Even that doesn't edit "in place". It deletes the old file and creates a new one.

This is important since this has side-effects if you're not the owner of the file -- it changes the owner.

neutronscott · July 27, 2012, 4:33pm

i was under the impression ed does it right.

mute@goflex:~$ ls -li input && cat input
7345 -rw-r--r-- 1 mute mute 38 Jul 27 20:32 input
line1 =
line2 ==
line3 ===
line4 ====
mute@goflex:~$ printf '%s\n' 'g/===*/s//==/g' w q | ed -s input
mute@goflex:~$ ls -li input && cat input
7345 -rw-r--r-- 1 mute mute 35 Jul 27 20:32 input
line1 =
line2 ==
line3 ==
line4 ==

posix, but not wasn't installed on debian squeeze minimal..

alister · July 27, 2012, 5:25pm

ed does edit the file in place, but it slurps it into memory. That's a lot of memory if the file is 38 GiB. Further, some (most?) ed implementations keep a copy of the entire buffer in a tmp file.

If a humongous file (larger than unused storage and available memory) needs to be edited in place, it can be, but you may have to craft a custom solution rather than use a general purpose editor.

Regards,
Alister

drl · July 27, 2012, 5:55pm

Hi.

And the inode, which may have consequences for some programs: inode - Wikipedia, the free encyclopedia ... cheers, drl

methyl · July 27, 2012, 10:22pm

Further to @alister. How on earth did you end up in the situation of trying to amend a 38Gb flat file with unix Shell tools?
Do you have a database engine?