sed command to delete everything after > on line

LMHmedchem · June 18, 2012, 1:37pm

I have a large data file that I need to edit. There are lines like,

>  <IDNUMBER> (ST000002)
>  <SUPPLIER> (ST000002)
>  <IDNUMBER> (ST000004)
>  <SUPPLIER> (ST000004)

and I need to delete everything after the >, excepting the end of line.

>  <IDNUMBER>
>  <SUPPLIER>
>  <IDNUMBER>
>  <SUPPLIER>

I am not completely sure that the > appears only in this place, so I think I need to do something like,

sed 's/>\ \ <*>\ (*)/>/g'

But I have misplaced the list of all of my sed commands and don't remember the syntax, what needs to be escaped, etc.

Help would be appreciated.

LMHmedchem

Scrutinizer · June 18, 2012, 1:39pm

Try:

sed 's/[^>]*$//'

LMHmedchem · June 18, 2012, 1:47pm

That deletes the trailing content I don't want, but also deletes everything else in the file. Maybe I was unclear, there are allot of other lines in the file that just need to be printed, only the lines that look like,

>  <IDNUMBER> (ST000002)

need to be processed.

In other words, something like,

  9  7  1  0  0  0  0
  6 15  1  0  0  0  0
  5 14  1  0  0  0  0
  3  2  1  0  0  0  0
 13  5  1  0  0  0  0
 17 16  2  0  0  0  0
M  END
>  <IDNUMBER> (ST000004)
ST000004

>  <SUPPLIER> (ST000004)
TimTec

processed to,

  9  7  1  0  0  0  0
  6 15  1  0  0  0  0
  5 14  1  0  0  0  0
  3  2  1  0  0  0  0
 13  5  1  0  0  0  0
 17 16  2  0  0  0  0
M  END
>  <IDNUMBER>
ST000004

>  <SUPPLIER>
TimTec

Your sample code gave me,








>  <IDNUMBER>


>  <SUPPLIER>

Thanks,

LMHmedchem

Scrutinizer · June 18, 2012, 1:54pm

I see, yes then it should be:

sed 's/>[^>]*$/>/'

or perhaps:

sed '/^>/s/>[^>]*$/>/'

LMHmedchem · June 18, 2012, 2:19pm

Thanks, that worked great. Sed is such a great tool. I had a 250K text file with odd formatting that would have blown up the next tool in the chain and sed fixed it in 23s. At times, it does look a bit like hieroglyphics, and I really need to find my notes, but there is nothing like the right tool for the job.

Thanks again,

LMHmedchem