Print all lines between two keyword if a specific pattern exist

Amit_Joshi · August 14, 2015, 1:55am

I have input file as below I need to check for a pattern and if it is there in file then I need to print all the lines below BEGIN and END keyword. Could you please help me how to get this in AIX using sed or awk.

Input file:

ABC
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****
***** BEGIN *****
My name is Rahul.
***** END *****
XYZ

If I am looking for Amit then Output should be

******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****

If I am looking for Rahul then Output should be

***** BEGIN *****
My name is Rahul.
***** END *****

If I am looking for XYZ then Output should be nothing.

From below command I am able to get all the data between two keyword but not able to grep the data.

sed -n '/BEGIN/,/END/p'

anbu23 · August 14, 2015, 2:21am

$ awk ' /BEGIN/,/END/ { str = str ?  str "\n" $0 : $0 } /END/ { if( str ~ /Amit/ ) { print str }; str = "" } ' file
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****

$ sed -n -e '/BEGIN/,/END/{H;/BEGIN/h;}'  -e '/END/{g;/Amit/p;} ' file
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****

sed -n "/BEGIN/h;/BEGIN/!H; /END/ {x;/Amit/p;}" file
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****

derekludwig · August 14, 2015, 6:01am

While not using one of the requested tools:

perl -00 -ne 'print $& if m{^\N+\s+BEGIN\s+\N+\n.*?Amit.*?\n\N+\s+END\s+\N+\n}ms;'

MadeInGermany · August 14, 2015, 6:59am

Another awk solution, that doesn't need two times the /END/ expression:

awk '/BEGIN/ {block=1} block {str=str sep $0; sep=RS} /END/ {block=0; if (str~/Amit/) {print str} str=sep=""}' file

bakunin · August 14, 2015, 7:42am

This command prints everything between a line containing "BEGIN" and one containing "END", regardless of what is in between.

Your problem is, if i have understood correctly, is to only print these lines if some condition (a line containing a third word, like "Amit") is met. Here is how you solve these problems with sed:

The first thing you need is: you have to store the text in question somewhere until you decide if you print it or not. For this there is the "hold space". This is a text buffer you can manipulate separately from the "pattern space". It will maintain its content across the processing of lines. See the man page of "sed", the commands "g", "h", "x", "G" and "H". The principle being when you encounter a line with "BEGIN" you start a new cycle: put the encountered line in the hold space. From here on you append every line to there and accumulate the text this way until you encounter a line with "END" in it. When you finally encounter a line with "END" in it you clean out the hold space and start over again.

The second task is to decide if the text should be printed or not: When you encounter a line with END this ends the cycle: move all text accumulated in the hold space back to the pattern space, search the text there for your search string "Amit" and either print the whole text or discard it.

sed -n '/BEGIN/,/END/H        # from "BEGIN" to "END" execute "H" (append to hold space)
        /END/ {               # for every line containing "END" do:
                 g            # replace the pattern space with the content of the hold space
                              # (that makes the accumulated text available again)
                 /Amit/p      # if in this text is "Amit" somewhere, print it
                 s/.*//       # delete the text
                 x            # exchange pattern- and hold space
                              # (so hold space will be empty for the next cycle)
              }' /path/to/your/file

Or, the same in one line:

sed -n '/BEGIN/,/END/ H;/END/ {;g;/Amit/p;s/.*//;x;}' /path/to/your/file

I hope this helps.

bakunin

/PS: only now i saw that Anbu23 has already posted a sed solution which works the same way. My script and his second solution are quite similar but his way of cleaning the hold space is better than mine, so i suggest you use his.

drl · August 14, 2015, 9:27am

Hi.

There are grep-like utilities that are designed to handle these kinds of situations. Here is one called cgrep :

#!/usr/bin/env bash

# @(#) s1	Demonstrate text block extraction with enclosed pattern, cgrep.
# For cgrep source see:
# http://sourceforge.net/projects/cgrep/
# Verified existence Fri Aug 14 08:08:53 CDT 2015

# Support data and functions.
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C cgrep

NAME=${1?" Need a name"}
shift
FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results for searching for \"$NAME\":"
cgrep -D -F -w "BEGIN" +w "END" "$NAME" $FILE

exit 0

producing:

./s1 Amit

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
cgrep ATT cgrep 8.15

-----
 Input data file data1:
ABC
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****
***** BEGIN *****
My name is Rahul.
***** END *****
XYZ

-----
 Results for searching for "Amit":
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****

and

./s1 Rahul
...
-----
 Results for searching for "Rahul":
***** BEGIN *****
My name is Rahul.
***** END *****

and

./s1
...
./s1: line 16: 1:  Need a name

The source for cgrep can be found at the site mentioned in the comments in the demonstration script above.

Best wishes ... cheers, drl

protocomm · August 14, 2015, 10:48am

[quote=anbu23;302952135]

$ awk ' /BEGIN/,/END/ { str = str ?  str "\n" $0 : $0 } /END/ { if( str ~ /Amit/ ) { print str }; str = "" } ' file
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****

Could you explain the part

str = str ?  str "\n" $0

of your command line please, it's confuse for me.
I've understood that for each block /BEGIN/,/END/ you do this { str = str ? str "\n" $0 : $0 }, i know the result but it's not clear for me
Thx.

Amit_Joshi · August 14, 2015, 11:48am

Thanks bakunin for superb explanation It really helped me to understand the each option used in command.

Thanks anbu23 and all for your inputs.It was really helpful.

MadeInGermany · August 14, 2015, 1:35pm

[quote="protocomm,post:7,topic:354656"]

anbu23;302952135:

$ awk ' /BEGIN/,/END/ { str = str ?  str "\n" $0 : $0 } /END/ { if( str ~ /Amit/ ) { print str }; str = "" } ' file
******** BEGIN *****
My name is Amit.
I am learning unix.
***** END *****
Could you explain the part
str = str ?  str "\n" $0
of your command line please, it's confuse for me.
I've understood that for each block /BEGIN/,/END/ you do this { str = str ? str "\n" $0 : $0 }, i know the result but it's not clear for me
Thx.

The condition ? trueaction : falseaction exists in awk like in C.
str gets the concatentation of str "\n" $0 if str is not null (undefined, empty, or 0 ), otherwise it gets $0.
$0 is the entire line.
BTW it will misbehave if the line is 0 ; better is { str = (str != "") ? (str "\n" $0) : $0 } but still has problem with empty lines.
That's why I prefer the { str = str sep $0; sep="\n" } method.