Adding lines to pattern space

jawsnnn · June 10, 2012, 6:07am

Is there any way to add lines to the pattern space of sed?

I know a bit about the N flag, but am not able to use it to do what I want which is:

read a file n lines at a time. If I find a match I quit else move to the next line. e.g. if I am looking for the following three lines

Hello
How
Are you

I want to start with lines 1 to 3, then move to lines 2 to 4, then 3 to 5 and so on, until I find the required lines or EOF. My query input can be more than 3 lines in size.

Using this code for the above example:

sed -n 'N;N; s/Hello\nHow\nAre you/&/p' file1

does not work as it will move on to the fourth-sixth line set after reading the first three.

I would like to do this with sed. It can be probably be done easily with a script as I see it, but I would like to know more about sed's flags.

bakunin · June 10, 2012, 6:36pm

I sense a few misconceptions about how sed works and i will try to address them one by one:

sed always works line-oriented. You can't make it "read 3 lines at once" naturally. You can manipulate the pattern space so that it stays persistent across read lines (the "N" command, the "P" and "D" commands), but sed still reads one line, works with this through your list of commands, then reads the next line, etc..
sed immediately stops when encountering <EOF>. If you use multi-line patterns utilizing "N", etc. you have to take provisions against loss of the last line(s).

in the following examples I'll use this file as input:

Lets start with the second issue: suppose we want to put a "--" at the end of every third line in a file. We try this naively:

sed 'N;N;s/$/--/p' infile

This is what we get:

# sed -n 'N;N;s/$/--/p' infile
1
2
3--
4
5
6--
7
8
9--

This worked - in a way, but the last line is missing from the result. Why? After the third set of lines was printed - up to line 9 - line 10 is read. The first instruction is "N", which fails, because EOF is reached, so "sed" exits. Because of the "-n" option there is no automatic printing of the line and therefore it is silently dropped. How can that be corrected?

sed -n '$ {;p;} N;N;s/$/--/p' infile

This seems to work, but add a 11th line to "infile" now and try again. Gee..! I stop here for 5 minutes to give you time to wallow in self-pity. ;-))

OK, back to business. We have to do it differently. We can search the pattern space and specifiy rules. If we have read 1 line the pattern space will contain no "\n", with 2 lines it will contain 1 "\n" and with 3 lines it will contain 2 "\n"s. We have to do (in this order):

Lines with 2 "\n"s: we have read 3 lines, substitute, print and start over. Starting over is most easily done by deleting the pattern space, which transfers control to the end of every sed-script to start with a new line.

The last line: if the last line happens to be a third line it would have been caught by the first rule, so that can't be. Therefore we just print what we have and get out.

Lines with no or 1 "\n": just get another line and head back to start.

sed -n ':start
        /\n.*\n/ {
             s/$/--/p
             d
         }
        $ {
             p
          }
        /\n/ !{
             N
             b start
         }
        /\n/ {
             N
             b start
         }' infile

You will see that this works regardless of the number of lines in "infile".

Now to your problem: you will want to consult the man-page of "sed", especially about the sub-command "D". It works like "d", but it doesn't delete the whole pattern space, but only up to the first "\n". (So, effectively it deletes one read line from pattern space, yes?) Also read the part about "labels" and the "b" command, to understand that part better.

I leave the application of this to the actual problem to the interested reader, who will surely enjoy to try him newfound abilities on this interesting problem. ;-))

I hope this helps.

bakunin

PS: serious, this is non-trivial stuff and do not expect to get it right the first time. Keep trying, though, because sed can really do magic in the hands of the initiate. If you still have questions i'll be glad to answer them.

jawsnnn · June 11, 2012, 1:55am

Thanks for your In depth reply. I am pretty ok with the fact that sed does not work with multiple lines easily, and that it's core functionality is to work line by line. This is just a curious question. I will look at the man pages some more (believe me I already did) but it can get a bit mind boggling for a beginner however, now that I know what to look for (the flags you have suggested) it should be easier.