Using sed to execute multiple commands

exm · July 10, 2013, 2:40pm

Let's say I have a file called test.out. In this file I want to do the following:

Search for DIP-10219 and with this:
Remove everything in front of cn=
Remove everything after *com
Remove duplicate lines
Replace ( with \(
Replace ) with \)

For 1-3 I have figured out this code:

sed -rn '/DIP-10219/ s/^.*[^,](cn=.*com).*$/\1/p' test.out

However, I can't figure out how to execeute 4-6 using one sed command. Any thoughts?

Thanks!

RudiC · July 10, 2013, 3:31pm

You won't be able to do 4. in sed that easily. Duplicate strings, yes, duplicate lines, difficult if not impossible.
5. and 6.:

sed -rn '/DIP-10219/ {s/^.*[^,](cn=.*com).*$/\1/;s/[(]|[)]/\\&/g;p}' test.out

DGPickett · July 10, 2013, 3:34pm

For multiline problems, you need a looper, a script that has an N, a $ test and a branch t or b, so you can pile up lines in the buffer. Duplicates would need to be sorted to be adjacent, and 'sort -u' or 'uniq' if already sorted, will get them, simpler. You do not have associative arrays like bash, awk, perl in sed to record all and detect dups in an unsorted file.

Don't think of it as one sed command, but one sed instance running a script.

5 and 6 are just this script line: s/[)(]/\\&/g

I find it is better to put looper functionality in a separate sed instance on the pipe. Sometimes, for speed, I chain many sed in a row, so each holds the line the least time.

exm · July 10, 2013, 3:42pm

RudiC - your part works. Thanks!

Another question. If I want to do another search and replace in it, like searching for 'abc' and replacing with 'abc-', how would the code change?

DGPickett - any suggestions for a looper build around RudiC's suggest code? Thanks a bunch!

RudiC · July 10, 2013, 4:00pm

Add

; s/abc/&-/g

to above code

DGPickett · July 10, 2013, 4:51pm

Here is a looper to remove more than one consecutive blank lines, which illustrates the concept:

sed '
  :loop
  $b
  N
  s/^\n$//
  t loop
  P
  s/.*\n//
  b loop
 '

I put my sed script on its own lines for clarity. If you want to check for lines with only spaces and tabs, that scrubbing can be done upstream in one line or inserted here as two lines. That is why I say sed loopers are usually best kept separate from non-loopers. There is no single place to filter and translate single lines in a looper without redundant processing, like substituting on each line twice.

Narrative:

Create a branch target,
if last line branch to end of script (print buffer and exit) as $N usually tosses the last line,
get the next line on end of buffer as '\nLine_2',
if both lines are empty, remove one,
if line was removed branch back to loop,
print the first line,
remove the first line and
branch back to loop.

exm · July 11, 2013, 11:33am

RudiC... That works perfect. Thanks. Didn't know why I didn't figure that out myself

DGPickett... Thanks for the example. I'm actually trying to remove duplicate lines, not blank lines. Not sure if that makes a difference... How would your code fit in with the code presented here? Would I need to use the first output, send it to file A, and then use a loop outputting to another file?

Here's the code so far:

sed -rn '/DIP-10219/ {s/^.*[^,](cn=.*com).*$/\1/;s/hbo/&-ns/g;s/[(]|[)]/\\&/g;p}' test.out

RudiC · July 11, 2013, 4:35pm

As said before, removing duplicate non-empty lines is easier in awk:

awk '!X[$0]++' file

will do the job for you.

exm · July 11, 2013, 4:49pm

Perfect! Thank you all!

MadeInGermany · July 13, 2013, 4:28am

The previous awk deletes duplicate lines thruout the file. Therefore it needs to learn (store in memory) the whole file.
Consecutive identical lines can be deleted with storing only two lines

sed '$!N; /^\(.*\)\n\1$/!P;D'

awk 'prev!=$0 {print} {prev=$0}'

uniq

exm · July 13, 2013, 11:24am

Danke schon!

DGPickett · July 16, 2013, 2:52pm

Let's open it up for ease of maintenance. I never used -r for extended regex, as usually my sed is not GNU! The third substitute does not need it, as [] is implicitly and or.

sed -rn '
  /DIP-10219/{
    s/^.*[^,](cn=.*com).*$/\1/
    s/hbo/&-ns/g
    s/[(]|[)]/\\&/g
    p
   }
 ' in_file >out_file

Removing duplicates can be done by sed only if the file is sorted, but it can be done by 'sort -u' or 'uniq', if you are talking about whole line duplicates or well defined key field duplicates. In some situations, people need to remove later or earlier lines with the same key fields that are not entirely duplicate lines. One behaviour of 'sort -u' that is very useful is that all later lines for the key are deleted; the first survives. Sometimes we sort the file in reverse order if the last for a key is desired. To detect duplicates on the fly, you need a tool that can store past lines or keys and look up each line, like awk, ksh or bash.