sed replace characters using a wildcard

LMHmedchem · January 19, 2016, 6:19pm

Hello,

I have some data that looks like the following,

>  <SALTDATA> (OVS0199262)
HCl

>  <IDNUMBER> (OVS0199262)
OVS0199262

>  <SUPPLIER> (OVS0199262)
TimTec

>  <EMAIL> (OVS0199262)
info@timtec.net

>  <WEBSITE> (OVS0199262)
http://www.timtec.net

I need to remove the data in the parentheses and the space following the final > in those lines. The value in parentheses is different in each record.

I tried sed,
sed 's/>\ $.*$/>/g' infile > modfile

to me, this reads, find ">" followed by 1 space, followed by open parentheses, followed by any number of any character, followed by close parentheses and replace with ">".

It seems like this should work, unless I don't have the syntax right. Instead, I am getting the output,

>
HCl

>
OVS0199262

>
TimTec

>
info@timtec.net

>
http://www.timtec.net

where I want the output,

>  <SALTDATA>
HCl

>  <IDNUMBER>
OVS0199262

>  <SUPPLIER>
TimTec

>  <EMAIL>
info@timtec.net

>  <WEBSITE>
http://www.timtec.net

Here, sed seems to be matching the first greater than on the line instead of the second.

What am I missing here? I am guessing I need to escape the parentheses differently since they have their own meaning in sed.

thanks,

LMHmedchem

---------- Post updated at 06:19 PM ---------- Previous update was at 06:04 PM ----------

I found this,
sed 's/>[^>]*$/>/'
which works by removing everything after the second >. This seems to give me what I want.

I would still like to know what was wrong with my sed command above if anyone can comment.

LMHmedchem

wbport · January 19, 2016, 6:25pm

sed 's/> (.*)/>/' infile > modfile

is one way. You don't escape parenthesis when you want a real parenthesis, they are escaped to play back groupings in a specific order.

P.S. Instead of the dot, [^)] instead if there is any possibility there could be an extra close parenthesis.

LMHmedchem · January 19, 2016, 6:31pm

It never occurred to me to not escape a control character like parenthesis, thanks for the tip. This processed a 1GB input file in about 1.5 min, which is pretty slick I think.

LMHmedchem

wbport · January 19, 2016, 6:32pm

You're welcome.