Linux command to find and replace occurance of more than two equal sign with "==" from XML file.

Please help me, wasted hrs:wall:, to find this soulution:-
I need a command that will work on file (xml) and replace multiple occurrence (more than 2 times)

Examples

  1. '==='

  2. '===='

  3. '======='

should be replaced by just '=='
Note :- single character should be replaced. (= should be replaced by ==)

I have used this command:-

sed 's/==*/==/g' example.xml > example.xml

But after using this command i found, where single character = is there, it changes into ==
Below are two for examples:-

Before:-

1) <namespace key="109" case="first-letter">Book talk</namespace>
2) st        = ''[[Sleeping Murder]]'' 
| cause       =		

After:-

1) <namespace key=="109" case=="first-letter">Book talk</namespace>
2) st        == ''[[Sleeping Murder]]'' 
| cause       ==	

Please help,
Regards,
Red.

Seems to me that this is what you wanted, isn't it?

If you only want to replace two or more occurences of "=" use:

sed 's/===*/==/g' example.xml > example.xml.result

You cannot direct the output of sed to the input file, because that would overwrite and destroy the input file after the first line. Therefore a different output file name.

I hope this helps.

bakunin

1 Like

Is there any possibility, that after editing original file, and after process completes, can we replace original with output file, Becoz i have a file of 38 Gb.
I have managed shell script, as i am doing shell scripting first time, i am trying to create a script where file should be deleted after completing the process.

I want to extend command and delete original file and name the new file created same as old one.

Ex.

 sed 's/===*/==/g' inputfile.xml > outputfile.xml && COMMAND_FOR_DELETING_OLD_FILE && \
COMMAND_FOR_RENAMING_NEW_FILE_TO_SAME_AS_OLD_FILE 

-Red.
(Becoz which i am working is 38 Gb and my Server dont have much resources, right command will help me.)

There is no other way than what you have already found out: use a different file and then move the new file over the old file.

The reason for this i have described here and in some other postings too - i don't want to repeat it.

Btw., a suggestion: stay away from GNU-seds "-i" switch: it will just make your script less portable, but will do nothing else than do the "mv"-operation automatically afterwards.

I hope this helps.

bakunin

Hello,

Thank you for your suggestion. I have used, Below command, where it gives output in same file.

perl -i -pe's/===*/==/g' Example.xml

Once again thank you for helping :slight_smile:

-Red

You seem to have misunderstood what i explained. "perl -i" does the same as "sed -i": it creates a temporary second file where it stores the changes and moves this over the original file as the last step.

You gain nothing by using perl (save for a considerably slower speed of execution, because native UNIX commands are way faster).

The point is simple: to edit a file you need to be able to store 2 versions of it.*) There is no way around that. The "-i" options of various tools just blur that fact by hiding this temporary file, but it is still necessary.

If you fear a long execution time for moving the file: don't. It is in fact just a change in the files i-node (which is a few bytes) as long as the temporary file and the original file are on the same filesystem. To execute

mv /path/to/fileA /path/to/fileB

takes the same time, regardless of the size of this file (as long as they both are on the same filesystem). So set your "TMP" or "TMPDIR" variable accordingly and have enough room on your disk - some 100GB should not really be a problem these times of multi-TB SAN storage fabrics.

I hope this helps.

bakunin

________________

*) actually this is not completely true, because there is a trick:

(cat /path/to/file) | sed '<somecommand>' > /path/to/file

This will work for files which are small enough to fit into memory. The downside is, that if anything goes wrong (power loss, reboot, process aborted, ...) your data will be irrevocably destroyed. You sure do not want to use this hack on critical data just to save a few GB of (temporary) diskspace.

That won't work even when there's just a single byte in the file, if the shell first creates the sed portion of the pipeline. There are no guarantees on which component of a pipeline will be created first.

Regards,
Alister

Rule of thumb is never edit your originals. If you destroy your data, you are screwed.

I agree that you should never edit the originals. How ever you can inplace edit a file with sed:

sed -i 's/==*/==/g' file.txt (or what ever the file is called).

I use this a lot in post kickstart scripts (always with a copy of the original file created first.) It is really useful.

Even that doesn't edit "in place". It deletes the old file and creates a new one.

This is important since this has side-effects if you're not the owner of the file -- it changes the owner.

i was under the impression ed does it right.

mute@goflex:~$ ls -li input && cat input
7345 -rw-r--r-- 1 mute mute 38 Jul 27 20:32 input
line1 =
line2 ==
line3 ===
line4 ====
mute@goflex:~$ printf '%s\n' 'g/===*/s//==/g' w q | ed -s input
mute@goflex:~$ ls -li input && cat input
7345 -rw-r--r-- 1 mute mute 35 Jul 27 20:32 input
line1 =
line2 ==
line3 ==
line4 ==

posix, but not wasn't installed on debian squeeze minimal..

ed does edit the file in place, but it slurps it into memory. That's a lot of memory if the file is 38 GiB. Further, some (most?) ed implementations keep a copy of the entire buffer in a tmp file.

If a humongous file (larger than unused storage and available memory) needs to be edited in place, it can be, but you may have to craft a custom solution rather than use a general purpose editor.

Regards,
Alister

Hi.

And the inode, which may have consequences for some programs: inode - Wikipedia, the free encyclopedia ... cheers, drl

Further to @alister. How on earth did you end up in the situation of trying to amend a 38Gb flat file with unix Shell tools?
Do you have a database engine?