sed to replace two lines with one

estebandido · September 20, 2010, 3:10am

I want to use sed to check if a short line is contained in the line after it, and if it is, to delete the short one. In other words, the input is...

This is a
This is a line

... and I want it to give me...

This is a line

Here's what I've tried so far: s/$^.*$\n$\1.*$$/\2/

Also, is there a way to get sed to act directly on the input file, as opposed to a screen dump or creating an output file?

alister · September 20, 2010, 3:36am

sed -n 'n;p'

To edit files in place, take a look at the ed command. If for some reason that's not suitable, if you're using gnu sed, take a look at the -i option.

Regards,
Alister

estebandido · September 20, 2010, 1:50pm

This is perfect. Thanks!

rdcwayx · September 20, 2010, 8:33pm

Are you sure alister's code resolved your problem? Or you changed your idea?

Alister's code only use to print even lines.

estebandido · September 20, 2010, 9:22pm

Ah.... I wondered how it cut so many lines. It worked for the cases I looked for, but I should have looked closer.

No, that won't do at all. I need to inspect each line, and compare it to the next.

Can you tell me why what I have isn't working for me? I'm trying to take the first line, see if the second line is the first with any more characters, and then if so, replace the whole thing with the second line.

alister · September 20, 2010, 9:33pm

Woops. Obviously I missed the crux of the problem. Apologies for the noise.

---------- Post updated at 09:33 PM ---------- Previous update was at 09:26 PM ----------

How about:

sed 'N; /^\([^\n]\{1,\}\)\n\1/s/.*\n//'

That will inspect a pair of lines to see if the first is a leading substring of the next. Then it moves on to the next pair of lines. There is no overlap between pairs. 1-2, 3-4, 5-6, etc ... are inspected, not 1-2, 2-3, 3-4. Your problem statement wasn't explicit in this regard, so I chose the simpler of the two to implement.

Example:

$ cat data
ab
abcd
12
345
ef
efgh
$ sed 'N; /^\([^\n]\{1,\}\)\n\1/s/.*\n//' data
abcd
12
345
efgh

Regards,
Alister

rdcwayx · September 20, 2010, 9:52pm

Not really correct:

cat infile

XXXX
ab
abcd


sed 'N; /^\([^\n]\{1,\}\)\n\1/s/.*\n//' infile
XXXX
ab
abcd

cat infile

ab
abcd

sed 'N; /^\([^\n]\{1,\}\)\n\1/s/.*\n//' infile
abcd

---------- Post updated at 11:52 AM ---------- Previous update was at 11:49 AM ----------

and if you see this situation, what's your expect output?

ab
abc
abcd
abcde

alister · September 20, 2010, 11:29pm

With this implementation, an empty line is never a match. Perhaps it should be. In any case, so long as the handling of empty lines is left unspecified, I'm fine with its level of correctness. I'll propose the simplest solution unless there is an explicit requirement demanding something more complicated.

Your sed's results differ from mine (though the sed code itself is posix-compliant). Note: there should be two blank lines at the end of the output, but for some reason the forum's markup is eating them.

$ cat data

XXXX
ab
abcd


$ sed 'N; /^\([^\n]\{1,\}\)\n\1/s/.*\n//' data

XXXX
abcd

As I stated in my previous post, the lines are compared in non-overlapping pairs.

$ cat data
ab
abc
abcd
abcde
$ sed 'N; /^\([^\n]\{1,\}\)\n\1/s/.*\n//' data
abc
abcde

---------- Post updated at 11:29 PM ---------- Previous update was at 10:27 PM ----------

Here's a version that compares overlapping pairs of lines (1-2, 2-3, 3-4, etc) and considers a blank line to always be a leading substring of the following line (i.e. blank lines are discarded).

sed -n '1{h;d;}; H; x; /^\([^\n]*\)\n\1/!s/\n.*//p; ${g;/./p;}'

Regards,
Alister