Sed question

mvalonso · October 5, 2007, 4:05pm

I want to delete in all lines of a file, from a given position (in fact, the position where I find the character ") until the end of line.
How can I do this?
Thks in advance.

devtakh · October 5, 2007, 4:21pm

use this-

ed '/\"/,$/d' inputfile

cheers,
Devaraj Takhellambam

devtakh · October 5, 2007, 4:25pm

mis a "s"

it is sed not ed..

mvalonso · October 5, 2007, 5:44pm

I get the following error:

Unrecognized command: /\"/,$/d

reborg · October 5, 2007, 5:55pm

sed 's/".*$//' file

The $ is not strictly required
or

sed 's/^\([^"]*\)".*/\1/'

bhargav · October 5, 2007, 5:56pm

sed 's/\".*//g' test

mvalonso · October 8, 2007, 10:06am

Thks a lot!!!!!

cronjob78 · October 15, 2007, 1:10pm

Is there anyway (perferably in one command line) that you can manipulate the file itself rather than outputting to a temporary file and then renaming it??

matrixmadhan · October 15, 2007, 2:04pm

in-file editing

"-i" option of sed ( only with GNU sed )

drl · October 15, 2007, 2:25pm

Hi.

Note than "-i" really refers to the result, not the operational characteristics; a temporary file is still created. Think of it as a convenience, not as an option to save space:

(I think I would have written that last part as "... to the input file's original name".)

cheers, drl

cronjob78 · October 15, 2007, 5:25pm

Thanks very much for all the replies. That -i instruction is very useful.

My task has grown another leg. This is what I have now

sed -i '/<script>function v470/,/<\/script>/ s/<script>.*/<\/body>/' filename

this will replace a line that starts with
<script>function v470......and ends with.....</script></body>

and replace it with just

</body>

BUT SOMETIMES the line I want to replace DOES NOT end with </body>

and therefore I just want to blank/delete the line and not insert an unnecessary </body>

sed -i '/<script>function v470/,/<\/script>/ s/<script>.*//'

This will work but can I do it all together in one line for both cases?
i.e. if a line starts with <script> can I delete it as far as </script> so that it doesn't erase the (occasional) trailing </body> if it exists??

bakunin · October 17, 2007, 10:18am

To explain, what you'll need to do I'll start by explaining what your script really does. Your code, reformatted, is like:

sed '/start/,/end/ {
    s/this/that/
    }'

sed does the following: it searches the file, line by line, until it finds a line containing "start". Now it applies to this and every following line the commands between "{" and "}", a so-called "rule". In this case this is a single command: "s/this/that/", which changes the first occurence of "this" in a line to "that" . sed does this until it encounters a line containing "end", where it will stop applying the rule until it again finds a line containing "start", where it will again apply the rule to every line until finding one containing "end", and so on.

So much for the general case, back to your problem: your sed-line states that you search for a line containing "<script> function v470" (your "start"-clause) and from there on up to a line containing "</script>" (your "end"-clause) you apply the rule to replace "<script>.*", which means "everything including '<script>' to the end of the line" with nothing - in effect deleting "<script>" and everything following it on the same line.

Is this really what you want?

If you want to delete all your "<script> function v470" up to where you encounter "</script> you will have to work differently, because in the way you stated it you will only achieve what you ant to achieve when the <script>- and the </script>-clause appear on the same line:

blabla <script> function v470 blabla </script>

On such a line your script will work. But on the following text fragments (i mark blue what i presume you would like to cut out):

blabla <script> function v470 blabla </script> blabla

blabla <script> function v470
blabla
blabla
bla </script> blabla

it will fail.

In this case you have three different types of lines:

1) Lines which contain the start-clause and (maybe the end-clause)

2) Lines between type-1-lines and type-3-lines

3) Lines which only contain the end-clause (and maybe the start-clause)

Type-2-lines are the easiest: they can be deleted. Type-1-lines will have to deal with the special case where start- and end-clause are on the same line and hence Type-3-lines are reduced to a simple solution: cut everything out up to the end-clause. The type-1-lines we will split because we can match our special case (<script> and </script> on the same line) pretty easily.

(in the following i omit the sed-call, only giving the sed-script itself):

/<script> function v470/,/<\/script>/ {
       /<script> function v470.*<\/script>/ {
               s/<script> function v470*.<\/script>//
               s/^/@@@@/
       }
       /<script> function v470/ {
               s/<script> function v470*.//
               s/^/@@@@/
       }
              /<\/script>/ {
               s/.*<\/script>//
               s/^/@@@@/
       }
              /@@@@/ ! {
               d
       }
       s/^@@@@//
}

In detail: the first clause will make sure we work only on lines containing "<script> function v470" and all following lines up to a line containing "</script>".

On lines containing our start-clause AND end-clause we will snip out everything beginning with the <script>-tag and the </script>-tag. This has to be the first rule to make sure the next rule only has to deal with lines only containing the start-clause. On lines containing only the start-clause we will snip out everything from the start-clause to the end of line. On lines containing only the end-clause we will snip out everything from the beginning of line up to the end-clause. All the lines dealt with this way we will mark with four "@"-chars at the beginning to tell ourselves later on that we have already dealt with this line. Other lines we encounter will be lines between a start- and an end-clause but containing neither of then themselves. We simply delete them. This is where we need the markers: otherwise we would delete the lines we have already worked on here.

At last we remove the markers again and are finished.

bakunin