Search from a line to end of list using sed

gjackson123 · July 31, 2013, 9:59am

Dear Unix Experts :),

Below is a small section of a large file with the following list:

Starts with string " interest" as the heading
Followed by a list of activities
Ends with a blank line before starting with another different list.

E.g.

Sporting interest
football
tennis
swimming
golf
 
<next section>
.......
........

Below are the sed commands attempted without success:

sed -n '/Sporting interest/,/^$/p'
sed -n '/Sporting interest/,/^$/!p'

I need to find a way to determine the immediate last line at the end of sporting interest, before looping between the list of sports.

I am running Solaris 10 on Intel x86 platform.

Your advice would be much appreciated.

Thanks,

George

vgersh99 · July 31, 2013, 10:10am

I've edited the original post fixing the code tags - hope that's what your original input file looks like.
Please edit if otherwise.

What the desired output would look like?
Or what would need to be done with each 'block' (if that's the objective)?

RudiC · July 31, 2013, 3:16pm

Of course your attempts will not work as there is a space in that empty (better: not-so-empty) line making your regex fail. Either remove that space or include it in the regex!

gjackson123 · August 2, 2013, 10:24am

:)Thanks to both vgersh99 & rudiC for your advices thus far, I want it to be able to print the line / list of sporting interests. Below is my desire output:

 
football
tennis
swimming
golf

I have tried sed -n '/interest/,/^$/p' but only the heading line sporting interest was printed as opposed to the list of sporting interest.

In addition, I am looking for way to print content between certain pattern strings or first blank line (means end of list) in either sed, awk or both.

Thanks again,

George

RudiC · August 5, 2013, 4:03am

Your sed cmd works for me. And it will for you, given you adapt either the file or the regex as stated in my post #3

MadeInGermany · August 5, 2013, 6:56am

awk '/^ *$/ {p=0} p; /Sporting interest/ {p=1}'

sed -n -e '/Sporting interest/!b' -e ':a' -e 'n; /^ *$/b' -e 'p;ba'

gjackson123 · August 5, 2013, 9:51am

Thanks MadeInGermany for offerring your advice,

Your first suggest below using awk worked provided there are no blank lines between heading (Sporting interest) and the list itself:

awk '/^ *$/ {p=0} p; /Sporting interest/ {p=1}' interest.txt

However, this awk statement failed when the content of interest.txt include a blank line below:

Sporting interest
 
football
soccer
golf

Can we improve on this awk to accommodate blank lines between individual items but should stop at the end of this list?

It would be great if you could explain how the awk statement work as well.

Thanks again,

George

MadeInGermany · August 5, 2013, 10:53am

This one allows 1 blank line after the header line

awk '!(c && c--) && /^ *$/ {p=0} p; /Sporting interest/ {p=1; c=1}'

It is a sequence of implicit if clauses followed by an { action }.
The p; is an implicit if clause without { action } so the default action is { print }
With explicit if clauses and actions it would give

awk '{if ((c && c--)==0 && /^ *$/) {p=0}} {if (p!=0) {print}} {if (/Sporting interest/) {p=1; c=1}}'

The variable p indicates if something should be printed.
The variable c holds the number of lines to be not considered for searching for end of section. The (c && c--) let it decrement until 0 but not below 0.

RudiC · August 5, 2013, 1:55pm

Unless you depict some condition to tell one section from the next, there's no reliable way to "stop at the end of this list" as you require. Blank lines separating lists containing blank lines can work only if you can tell the exact number of blank lines per list.

gjackson123 · August 6, 2013, 9:09am

Thank you so much again to both MadeInGermany & rudiC for getting almost everything correct. There is a minor need to print from 4th line as opposed to 1st due to the following reason on the actual data which is slightly different to the simplistic sporting interest example provided:

( i ) Line 1 is blank which I don't need
( ii ) Line 2 is the heading followed by a line of hyphens which act as underline of headers in line 2.

Otherwise, it is working remarkably well even though I still need to digest it properly for the moment.

The current workaround is as follows due to my lack of knowledge in how these Awks work still:

 
awk '!(c && c--) && /^ *$/ {p=0} p; /Sporting interest/ {p=1; c=1}' | sed -n '4,$p'
 
awk '{if ((c && c--)==0 && /^ *$/) {p=0}} {if (p!=0) {print}} {if (/Sporting interest/) {p=1; c=1}}' | sed -n '4,$p'

I have tried twikking either of the p & c variables without much luck.

Thanks a million for the persistent support,

George

RudiC · August 6, 2013, 10:11am

Why don't you post a real life example with several sections each differing in composition? And, go to the extremes when it comes to the conditions to be met.

gjackson123 · August 6, 2013, 6:49pm

:)Hi RudiC,

The suggestion provided by MadeInGermany is already working. I simply need to print from line 4 to the end instead. Don't worry about it if this require much more work.

I hesitate to disclose people's confidential data which would require me to make a lot of changes first.

Thanks,

George

konsolebox · August 7, 2013, 3:22am

You can't really do this unless you have a proper end of section marker. If your end of section is a blank line but one of your values could sometimes be a blank as well then there's no way a script could figure out which one to parse.

gjackson123 · August 8, 2013, 9:59am

:)Hi konsolebox,

Luckily, I have tested the suggested Awk solution on a few hundreds set of data and have found no such discrepancies you have highlighted. As a result, I am very pleased and greatful for everyone's valuable advices and efforts for providing a working solution.

Thanks,

George