Select everything between first and last occurrence of same pattern

usha_rao · June 21, 2011, 8:46am

Greetings,

I am writing a script which requires as a part, selecting all the lines between the first and last occurrence of a pattern.

I have an nawk alternative that is working. But thre should be a generic script that should run on all os viz, linux, sun , aix.

The awk script that i am using is given below.

Awk script

 
nawk 'NR==FNR{if(/xyz/){last=NR;if(!first)first=NR};next} FNR>=first && FNR<=last' test.txt test.txt

 
Here nawk is doing two iteration:
the first to discover where the first and last occurrence are, the second to actually print the lines in between.

Can anyone help me with the sed substitute for selecting everything between the first and last occurrence of a pattern in just one iteration.

Thanks in advance.
Usha

getmmg · June 21, 2011, 9:02am

 
Is this what you are looking for 
Input:
this is test
one
two
three
four
two
one
end of file

"one" is the search string here.

Code:

perl -0ne 'print $1 if /one(.*)one/sg' test

Output:

 
two
three
four
two

usha_rao · June 21, 2011, 9:07am

Thanks for the reply getmmg.

But I need the lines containg the pattern also to be printed.

sample output from your example:

 
one
two
three
four
two
one

Also, i do not have perl in my system, Can you suggest some sed code?

Thanks,
Usha Rao

panyam · June 21, 2011, 9:09am

Hello Usha,

How about this?

awk -v c=0 '/xyz/&&c++<1 {a=$0;next} /xyz/&&c>0 { a=a"\n"$0;f=a;next} c>0 {a=a"\n"$0;next} END { print f}' input_file

yazu · June 21, 2011, 9:11am

sed -n '/one/,/one/p'

usha_rao · June 21, 2011, 9:16am

Hi Panyam,

Thanks for the reply.
The given awk solution perfectly works when i am using nawk in my solaris system.

But i am writing a generic script that should be common for all solaris , linux and aix system.

Is there any SED substitute.

Thanks,
Usha Rao

panyam · June 21, 2011, 9:18am

I believe "awk" is every where!!!! In AIX, LINUX and UNIX too!!!..

Franklin52 · June 21, 2011, 9:19am

Another one with awk:

awk 'f && /one/{print;exit}/one/{f=1}f' file

usha_rao · June 21, 2011, 9:26am

Thanks Yuzu & Franklin for simplified solution.

Hi Panyam,

On Solaris nawk does such type of complex operation. awk gives below error.Hence i was searching for other options.

Thanks for help.

 
awk: syntax error near line 1
awk: bailing out near line 1

Regards,
Usha

panyam · June 21, 2011, 9:28am

Hello,

You can do cheat in sed , like below:

 
sed -n '/xyz/,$ p' input_file | sed -n '1,/xyz/ p'
 
If you are in solaries: you can use
 
/usr/xpg4/bin/awk (or) define such that even awk will use nawk only!!! a synonym minght be!!

Regards
Ravi

ctsgnb · June 21, 2011, 9:34am

WATCH OUT : This code will exit at the second occurrence met so it does not exactly behave as you specified (since you expect thing be printed until the last occurrence of the pattern

 # echo "one
> one
> one
> one
> bla
> " | nawk 'f && /one/{print;exit}/one/{f=1}f'
one
one
#

alister · June 21, 2011, 10:14am

This code will fail for the same reasons ctsgnb mentioned; it will bail out after the second occurrence.

Regards,
Alister

panyam · June 21, 2011, 10:26am

Hello Alister,

Yes , you are correct. My awk one works fine but not the sed..searching how to accomplish the same in sed.

Alister / Usha :

Seems ..got it!! ..how about this?

 
sed -n '/one/,$ p' input_file | sed '1!G;h;$!d' | sed -n '/one/,$ p' | sed '1!G;h;$!d'

If you have "tac" in system:

sed -n '/one/,$ p' rem | tac | sed -n '/one/,$ p' | tac

Thanks
Ravi

ctsgnb · June 21, 2011, 10:37am

tail -r infile | sed '/one/,$!d' | tail -r | sed '/one/,$!d'

alister · June 21, 2011, 10:53am

usha rao:

 
nawk 'NR==FNR{if(/xyz/){last=NR;if(!first)first=NR};next} FNR>=first && FNR<=last' test.txt test.txt
Here nawk is doing two iteration:
the first to discover where the first and last occurrence are, the second to actually print the lines in between.

Can anyone help me with the sed substitute for selecting everything between the first and last occurrence of a pattern in just one iteration.

Thanks in advance.
Usha

The following sed script should do the same job as that nawk but in one pass. It will ignore any lines before the first occurence of the pattern. Beginning with the first occurrence of pattern inclusive, append the line in the pattern space to the hold space. When a matching line is found, we output the accumulated chunk and clear the hold space. Finally, the next line in the file is read into the pattern space before returning to the top of the loop:

#n

/xyz/! d

:top
H
/xyz/ {
    s/.*//
    x
    s/\n//
    p
}
n
b top

If that is saved in a file named first-last.sed, you can invoke it thusly: sed -f first-last.sed test.txt

If for some reason you'd rather not store the script in a file, but prefer to have it inline:

sed -n '/xyz/!d; :top
H; /xyz/ {s/.*//; x; s/\n//; p;}; n; b top' test.txt

To ensure maximum portability, this sed script uses only POSIX-compliant syntax (e.g. labels are terminated by a newline).

Regards,
Alister

panyam · June 21, 2011, 11:06am

Hello Alister,

For some reason the code you provided is not working. I would like to know ( the final sed one liner which will work ) the sed version too.

 
sed -n '/xyz/!d; :top
> H; /xyz/ {s/.*//; x; s/\n//; p;}; n; b top'  rem
xyz
xyz
xyz

alister · June 21, 2011, 11:18am

panyam:

Hello Alister,

For some reason the code you provided is not working. I would like to know ( the final sed one liner which will work ) the sed version too.
 
sed -n '/xyz/!d; :top
> H; /xyz/ {s/.*//; x; s/\n//; p;}; n; b top'  rem
xyz
xyz
xyz

What platform are you using? Also, please provide the entire input data file.

---------- Post updated at 11:18 AM ---------- Previous update was at 11:12 AM ----------

I tested on old versions of GNU/Linux, OpenBSD, and OS X. It worked fine on all of them.

Regards,
Alister

panyam · June 21, 2011, 11:44am

/home/ravi> uname -a
HP-UX avalon B.11.11 U 9000/800 3547052374 unlimited-user license
 
/home/ravi>cat rem
1
xyz
2
3
4
5
6
xyz
8
9
10
xyz
11
12
13

/home/ravi>sed -n '/xyz/!d; :top
> H; /xyz/ {s/.*//; x; s/\n//; p;}; n; b top' rem
xyz
xyz
xyz

Regards
Ravi

alister · June 21, 2011, 11:59am

Looks like it's not looping properly. Try the following:

sed -n '/xyz/!d; :top
H; /xyz/ {s/.*//; x; s/\n//; p;}; n;
b top
' rem

If that doesn't work, I'd be curious to know if the script file version works (using the code that begins with #n in a file and executing it with sed's -f option).

Regards,
Alister

ctsgnb · June 21, 2011, 1:37pm

this code will not give the wanted output since it will just toggle between on and off as soon as it find an occurence of the pattern and may skip some wanted lines.

# echo "one
> bla
> one
> bla
> bla
> one" | sed -n '/one/,/one/p'
one
bla
one
one
#

---------- Post updated at 07:37 PM ---------- Previous update was at 07:16 PM ----------

... yet another awk:

awk '{A[NR]=$0}/one/{m=x?m:NR;x=NR}END{while(m<=x)print A[m++]}' input