Grep lines between two specific words after matching pattern

sagar_1986 · February 19, 2020, 10:57am

grep specific number of lines from file after matching pattern

I want to grep all the lines between keyword 'start' to 'end' after matching pattern/ number 12345

Is it possible?
Thanks in advance

jim_mcnamara · February 19, 2020, 11:22am

Assuming Linux:

a=3
b=3
grep -A$a -B$b  '12345'  somefile

This searches from 3 lines before and 3 line after the keyword is found. Total 7 lines, including keyword '12345'

RavinderSingh13 · February 19, 2020, 12:10pm

Hello sagar_1986,

Could you please do share your efforts which you have put in order to solve your own problems?
We encourage users to learn coding on this forum, so please do share so.

Thanks,
R. Singh

sagar_1986 · February 20, 2020, 1:36am

Hello RavinderSingh13 ,

Actually i know how to get 'n' ( fixed) number of lines before and after the matching pattern, but here the issue is that position of 'start' and 'end' is not fixed.
Here i want to grep first occurrence of 'start' before matching pattern and first occurrence of 'end' after matching pattern and i don't have any idea how to do this, could you please help.

RavinderSingh13 · February 20, 2020, 1:54am

Hello sagar_1986,

Again you are missing the point, request to you is to add your efforts in form of code; so kindly do so and let us know then.

Thansk,
R. Singh

sagar_1986 · February 20, 2020, 3:13am

awk '/start/{flag=1} flag; /end/{flag=0}' sample.txt

awk '/start/,/123456/,/end/'  sample.txt 
sed '/start.*123456.*end/!d' sample.txt
sed -n '/start.*123456.*end/p' sample.txt

sed -e '/./{H;$!d;}' -e 'x;/123456/!d; sample.txt'

sed -e '/./{H;$!d;}' -e 'x;/start/!d;/123456/!d;/end/!d' sample.txt

so how to get lines which contains three matching patterns

vbe · February 20, 2020, 3:25am

You can see one way of doing on post #2

rbatte1 · February 20, 2020, 3:37am

If it's not a big file, you can get a simple to understand but clunky way by using the output of grep -n "start" $filename and grep -n "end" $filename to get you the record numbers to search between and then perhaps a sed -n "$start_line,$end_line"p $filename

This would be slow with a very large file though because you would read it all three times.

Does this help, or is your file big enough to warrant a solution that just reads it once?

Kind regards,
Robin

RudiC · February 20, 2020, 5:38am

Do you want the End pattern excluded? Try

awk '/Start/,/End/ {if (/12345/) P = 1; if (/End/) P = 0} P' file
12345
.
.

EDIT: included?

awk '/Start/,/End/ {if (/12345/) P = 1; if (P) print; if (/End/) P = 0}' file
12345
.
.
End

If there are more pattern pairs found in the file (which is not specified nor found in sample input) we need to rethink.

sagar_1986 · February 20, 2020, 6:31am

Dear rbatte1,

Dear RudiC,

As per your suggestion, i have tried the solution on sample file and output is like this

grep -n "start" $filename

grep -n "stop" $filename

there will be multiple matching patterns, hence need to grep with 3 matching pattern

rdrtx1 · February 20, 2020, 8:35am

grep -zoP "(?s)Start.*12345.*End" file

Scrutinizer · February 20, 2020, 10:23am

awk '/Start/{p=$0; next} p{p=p ORS $0} /End/{if(p~/12345/)print p; p=x}' file

Or, using vertical real estate:

awk '
  /Start/ {
    buffer=$0
    next
  } 
  buffer {
    buffer=buffer ORS $0
  } 
  /End/ {
    if(buffer~/12345/) print buffer
    buffer=""
  }
' file

MadeInGermany · February 21, 2020, 5:07am

The previous did not work for me.
Perhaps because of the mistake that a test buffer makes assumptions about the contents that can go wrong.
Better have a separate state variable (here: buffer_on)

awk '
  buffer_on {
    buffer=buffer ORS $0
  }
  ! buffer_on && /Start/ {
    buffer_on=1
    buffer=$0
  }
  buffer_on && /End/ {
    if (buffer~/12345/) print buffer
    buffer_on=0
  }
' file

--- Post updated at 12:07 ---

Introducing a separator variable:
one can include/exclude the Start and/or End pattern by simply changing the order of the 3 code blocks.

awk '
  buffer_on {
    buffer=buffer ors $0
    ors=ORS
  }
  ! buffer_on && /Start/ {
    buffer_on=1
    buffer=ors=""
  }
  buffer_on && /End/ {
    if (buffer~/12345/) print buffer
    buffer_on=0
  }
' file

awk '
  ! buffer_on && /Start/ {
    buffer_on=1
    buffer=ors=""
  }
  buffer_on {
    buffer=buffer ors $0
    ors=ORS
  }
  buffer_on && /End/ {
    if (buffer~/12345/) print buffer
    buffer_on=0
  }
' file

awk '
  ! buffer_on && /Start/ {
    buffer_on=1
    buffer=ors=""
  }
   buffer_on && /End/ {
    if (buffer~/12345/) print buffer
    buffer_on=0
  }
  buffer_on {
    buffer=buffer ors $0
    ors=ORS
  }
' file

awk '
   buffer_on && /End/ {
    if (buffer~/12345/) print buffer
    buffer_on=0
  }
  buffer_on {
    buffer=buffer ors $0
    ors=ORS
  }
  ! buffer_on && /Start/ {
    buffer_on=1
    buffer=ors=""
  }
' file

sagar_1986 · February 22, 2020, 2:44am

Dear Scrutinizer and MadeInGermany ,

both solutions are working fine for thank you so much.

Thanks rbatte1,

I have tried your approach also, it works fine, but issue is that sed -n "$start_line,$end_line"p $filename is not working
calling variable inside sed is not working, sed -n "3,15"p $filename is working fine but sed -n "$start_line,$end_line"p is not, is there any alternate solution.
As per your suggestion i have tried this


# input of sample.gmf is like this, without quotes, "DOCSTART_2 |" for start
  #input of sample.gmf is like this, without quotes,"DOCEND |" for end 

# input for third matching pattern is "343"

 
# after grep -n the output is like this "1:DOCSTART_2 |" or "520:DOCEND |"


grep -n "DOCSTART_2" /home/testing/sagar/sample.GMF | awk -F ":" '{print $1}'  >  cat /home/testing/sagar/DOCSTART_2    ## start line numbers for entire file 

grep -n "DOCEND" /home/testing/sagar/sample.GMF | awk -F ":" '{print $1}'  >  cat /home/testing/sagar/DOCEND   ## end line numbers for entire file

input=`grep -n "12345" /home/testing/sagar/sample.GMF | awk -F ":" '{print $1}'`     # matching pattern (343)
  
> /home/clarity/sagar/less_DOCSTART_2
> /home/clarity/sagar/great_DOCEND

for file in `cat /home/testing/sagar/DOCSTART_2`
do
a=`echo $file`
if [ $a -lt $input ]
then
echo $a >> /home/clarity/sagar/less_DOCSTART_2
else
echo hi >> /dev/null
fi
done
 DOCSTART=`sort -n  /home/clarity/sagar/less_DOCSTART_2 | tail -1`  ## greatest start number


 for file1 in `cat /home/clarity/sagar/DOCEND`
do
b=`echo $file1 | awk -F ":" '{print $1}'`
if [ $b -gt $input ]
then
echo $a >> /home/clarity/sagar/great_DOCEND
else
echo hi >> /dev/null
fi
done
DOCEND=`sort -n  /home/clarity/sagar/great_DOCEND | head -1`   ## lowest end number
cat /home/testing/sagar/sample.GMF | sed -n "$DOCSTART,$DOCEND"'p  > /home/testing/sagar/sample.GMF_new   ##### not working

any suggestions or any changes in approach.
Thanks in advance.