delete to end of line with SED

coldcanuck · January 8, 2007, 6:54pm

I have a file with a bunch of similar lines in which I want to extract a phrase delimited by the first occurance of a '>' at the beginning and the first occurance of a '<' at the end (you might have guessed these are beginning/end of HTML tags). Using Sed I have managed to delete up to and including th first '>'. Now I want to delete from the '<' to the end of each line.

e.g.
Good Text<extraneous characters

I want to delete the '<extraneous characters' part.

Any suggestions?

vino · January 8, 2007, 11:08pm

How about this ?

sed -n -e "s/^[^<]*<\([^>]*\)>.*/\1/p"

tayyabq8 · January 8, 2007, 11:58pm

As per your given example:

$echo "Good Text<extraneous characters" | sed -n -e 's/^\([^<]*\)<.*/\1/p'
Good Text

mahabooba · January 9, 2007, 12:16am

Hi,
It may useful to U.
$echo "Good Text<extraneous characters" | cut -d "<" -f1

Regards,
MahaboobAli

Glenn_Arndt · January 9, 2007, 12:32pm

Here's a sed command to try -- it's supposed to remove HTML tags. I don't know how well it works (I didn't write it).

sed -e :a -e 's/<[^>]*>//g;/</N;//ba' myfile.html

pondlife · July 1, 2008, 8:01am

I'm attempting something similar - I'm trying to identify the following and then remove it as well as anything that comes after it on the line:

I've played around with the code above but I can't seem to get my head around the regular expression.

I've got this so far (but it doesn't work):

sed -n -e 's/NUMBER \[[0-9]*\].*/\1/p'

Thanks!

pondlife · July 1, 2008, 8:27am

sed -i -n -e 's/NUMBER \[[0-9]*\].*/NUMBER /p'

I've found the above code works but it removes all lines that don't match too... Anyone know how I can have the above work but leave lines that don't match the pattern intact?

many thanks

ghostdog74 · July 1, 2008, 8:29am

\1 is used when you have grouped parenthesis. In your code, you don't have them. so it will not work. use substitution instead. assuming always 8 digits

sed 's/NUMBER \[.\{8\}\].*//' file