printing a text until a keyword is found

icantfindauser · February 1, 2012, 7:41pm

Hi,
here's the problem:

text="hello1 hello2
world
earth mars jupiter
planet"

how do I print the text until it finds the keyword "mars" so that the desired output is

output="hello1 hello2
world
earth"

I have rtfm of sed and I think the problem is, that if I find the word "mars" it will either cut out the whole line, (including earth), or it will just cut out mars. But I need to print the text just before mars.

Cheers!

balajesuri · February 1, 2012, 7:46pm

  cat inputfile | tr '\n' '|' | sed "s/\(.*\)mars.*/\1/" | tr '|' '\n'

Or

sed -n '1,/mars/p' inputfile | sed "s/\(.*\)mars.*/\1/"

agama · February 1, 2012, 9:25pm

Only one sed is really necessary:

sed 's/mars.*//; T; q;' input-file

If a substitution isn't made the T command causes the rest of the script to be skipped. When a substitution is made, the q (quit) is executed. By default, the contents of the buffer are printed when the end of the script is reached, and before the process terminates because of the 'q.' So, lines without 'mars' are printed in their entirety, and lines with mars are printed with mars, and all tokens that follow, deleted.

If you want to delete the whitespace between the previous token and 'mars,' a small tweek is needed:

sed 's/[ \t][^ \t]*mars.*//; T; q;' input-file

icantfindauser · February 2, 2012, 10:24am

Hi everyone,
thank you all so much for the help so far.

I tested all of the versions but only

 cat inputfile | tr '\n' '|' | sed "s/\(.*\)mars.*/\1/" | tr '|' '\n'

seemed to work properly.

sed 's/[ \t][^ \t]*mars.*//; T; q;' input-file

Didn't compile at all because it couldn't find the commands "T" and "q".

Moreover, the first oneliner above did not provide the desired results with special cases, when the keyword is in the file multiple times, like:

"jupiter
mars
mars
sun
earth"

The desired result should be

But the actual result is

"jupiter
mars"

since sed only takes the last keyword and not the first. I don't understand why though, because I thought sed starts from the beginning of the file / line and then executes the substitution with the first hit. Can anyone help?

Cheers!

EDIT:
I think I found a valid solution. It is ugly and big but it works .. if anyone has a better solution, please let me know.
Here it is:

sed -n "1,/$keyword/p" $file | tr '\n' '|' | sed "s/\(.*\)$keyword.*/\1/" | tr '|' '\n'

Cheers!

agama · February 2, 2012, 4:28pm

Ok, guessing you're on FreeBSD or a Sun box....

Try this -- still beats multiple translate commands in terms of efficiency:

sed '/mars/{s/[ \t][^ \t]*mars.*//; q;}' input-file

icantfindauser · February 2, 2012, 4:46pm

Hi agama,

yea sorry not for specifying, I'm on a MBP (MacBook Pro) with underlying Darwin.
Your command works pretty well besides the fact, that it deletes everything after the last appearance of the keyword (mars), like I described in one of my earlier posts.
For example:

"jupiter
earth
sun
mars
mars
saturn"

With your code, the output would be:

"jupiter
earth
sun
mars"

But it should be:

"jupiter
 earth
 sun"

Cheers!

Scrutinizer · February 2, 2012, 6:06pm

T is a GNU extension, so it will only work with GNU sed..

Try:

sed '/mars/{s/[ \t]*mars.*/"/;q;}' infile

but the " will be on the new line in your last example...
Otherwise try:

awk '{sub(/[[:space:]]*mars.*/,"\"")}1' RS= infile

agama · February 2, 2012, 9:13pm

Oops, yep that didn't work.
I must have had my brain on backwards as that mess looking for whitespace was horrible. Taking Scrutinizer's correct interpretation one step further and the last blank line is also removed.

sed  -n '/mars/{ s/[ \t]*mars.*//;  q;}; p' input-file