Grep lines before a pattern having some other pattern

dips_ag · February 5, 2015, 7:40am

Hi All,

I am trying to fetch lines before a pattern, I got to know about -B flag in grep but we have to pass the number to get those lines before some pattern say (X), now what if I want to get line/s with some other pattern say (Y) before X pattern? How to get about it? please help.

Input: test.xml

 
<EXTENSION NAME ="File Writer" SESSNAME ="NATIONAL_ARCHIVES" SUBTYPE ="File Writer" STAGETYPE ="Target Definition" TYPE ="WRITER">
<CONNREF CNXREFNAME ="CONN" CONNNAME ="" CONNNUMBER ="1" CONNSUBTYPE ="" CONNTYPE ="" VARIABLE =""/>
<ELEMENT NAME ="Merge Type" VALUE ="No Merge"/>
<ELEMENT NAME ="Merge File Directory" VALUE ="$DestFileDir"/>
<ELEMENT NAME ="Merge File Name" VALUE ="national_archives.out"/>
<ELEMENT NAME ="Append if Exists" VALUE ="NO"/>
<ELEMENT NAME ="Create Target Directory" VALUE ="NO"/>
<ELEMENT NAME ="Header Options" VALUE ="No Header"/>
<ELEMENT NAME ="Header Command" VALUE =""/>
<ELEMENT NAME ="Footer Command" VALUE =""/>
<ELEMENT NAME ="Output Type" VALUE ="File"/>
<ELEMENT NAME ="Merge Command" VALUE =""/>
</SESSNAME>
<SESSNAME NAME ="Relational Lookup" SINSTANCENAME ="LKP_CUSTOMER_EQUIPMENT_DTL_FACT" SUBTYPE ="Relational Lookup" STAGETYPE ="Lookup Procedure" TYPE ="LOOKUPEXTENSION">
<CONNREF CNXREFNAME ="DB CONN" CONNNAME ="XXX@YYY" CONNNUMBER ="1" CONNSUBTYPE ="Oracle" CONNTYPE ="Relational" VARIABLE =""/>
</SESSNAME>
<SESSNAME NAME ="Relational Reader" SINSTANCENAME ="SQ_EMPLOYEE" SUBTYPE ="Relational Reader" STAGETYPE ="Source Qualifier" TYPE ="READER">
<CONNREF CNXREFNAME ="DB CONN" CONNNAME ="YYY@XXX" CONNNUMBER ="1" CONNSUBTYPE ="Oracle" CONNTYPE ="Relational" VARIABLE =""/>
</SESSNAME>
<SESSNAME DDQINSTNAME ="SQ_EMPLOYEE" DSQINSTTYPE ="Source Qualifier" NAME ="Relational Reader" SINSTANCENAME ="EMPLOYEE" SUBTYPE ="Relational Reader" STAGETYPE ="Source Definition" TYPE ="READER"/>
<ELEMENT NAME ="General Options" VALUE =""/>
<ELEMENT NAME ="Write Backward Compatible Session Log File" VALUE ="NO"/>
<ELEMENT NAME ="Session Log File Name" VALUE ="employee.log"/>
<ELEMENT NAME ="Session Log File directory" VALUE ="$SessLogDir"/>
</SESSION>
<SESSION DESCRIPTION ="This session will load data into EMPLOYEE table." ISVALID ="YES" MAPNAME ="employee_job" NAME ="sess_employee_job" REUSABLE ="YES" SORTORDER ="Binary" VERSIONNUMBER ="8">
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="NO" PIPELINE ="1" SINSTANCENAME ="LKP_DEPARTMENT" STAGE ="2" TRANSFORMATIONNAME ="LKP_DEPARTMENT" STAGETYPE ="Lookup Procedure">
<PARTITION DESCRIPTION ="" NAME ="Partition #1"/>
<ELEMENT NAME ="CONN Information" VALUE ="DATA@DB"/>
</SESSTRANSFORMATIONINST>
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="NO" PIPELINE ="1" SINSTANCENAME ="FLT_DEPT_NAMES" STAGE ="2" TRANSFORMATIONNAME ="FLT_DEPT_NAMES" STAGETYPE ="Filter">
<PARTITION DESCRIPTION ="" NAME ="Partition #1"/>
</SESSTRANSFORMATIONINST>
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="NO" PIPELINE ="0" SINSTANCENAME ="DEPARTMENT_DIM" STAGE ="0" TRANSFORMATIONNAME ="DEPARTMENT_DIM" STAGETYPE ="Source Definition"/>
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="YES" PARTITIONTYPE ="PASS THROUGH" PIPELINE ="1" SINSTANCENAME ="SQ_STATE_DIM" STAGE ="2" TRANSFORMATIONNAME ="SQ_STATE_DIM" STAGETYPE ="Source Qualifier"/>
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="YES" PARTITIONTYPE ="PASS THROUGH" PIPELINE ="1" SINSTANCENAME ="TEST_file" STAGE ="3" TRANSFORMATIONNAME ="TEST_file" STAGETYPE ="Target Definition">
<FLATFILE CODEPAGE ="MS1252" CONSECDELIMITERSASONE ="NO" DELIMITED ="YES" DELIMITERS ="," ESCAPE_CHARACTER ="" KEEPESCAPECHAR ="NO" LINESEQUENTIAL ="NO" MULTIDELIMITERSASAND ="NO" NULLCHARTYPE ="ASCII" NULL_CHARACTER ="*" PADBYTES ="1" QUOTE_CHARACTER ="NONE" REPEATABLE ="NO" ROWDELIMITER ="0" SKIPROWS ="0" STRIPTRAILINGBLANKS ="NO"/>
<ELEMENT NAME ="Thread Record" VALUE ="RMS"/>
</SESSTRANSFORMATIONINST>
<SESSION DESCRIPTION ="" ISVALID ="YES" MAPNAME ="test_job" NAME ="sess_test_job" REUSABLE ="NO" SORTORDER ="Binary" VERSIONNUMBER ="2">
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="YES" PARTITIONTYPE ="PASS THROUGH" PIPELINE ="2" SINSTANCENAME ="TEST1" STAGE ="4" TRANSFORMATIONNAME ="TEST1" STAGETYPE ="Target Definition">
<ELEMENT NAME ="Target Table Name" VALUE ="TEST2"/>
</SESSTRANSFORMATIONINST>
<SESSTRANSFORMATIONINST ISREPARTITIONPOINT ="NO" PIPELINE ="0" SINSTANCENAME ="TEST" STAGE ="0" TRANSFORMATIONNAME ="AGG_TEST" STAGETYPE ="Source Definition">
<ELEMENT NAME ="Source Table Name" VALUE ="AGG_TEST2"/>
</SESSTRANSFORMATIONINST>

 
grep -B10 "Thread" test.xml | grep "SESSION DESCRIPTION"

results into

 
    <SESSION DESCRIPTION ="This session will load data into EMPLOYEE table." ISVALID ="YES" MAPNAME ="employee_job" NAME ="sess_employee_job" REUSABLE ="YES" SORTORDER ="Binary" VERSIONNUMBER ="8">
<SESSION DESCRIPTION ="" ISVALID ="YES" MAPNAME ="test_job" NAME ="sess_test_job" REUSABLE ="NO" SORTORDER ="Binary" VERSIONNUMBER ="2">

But the problem I am facing is that I don't know where "SESSION DESCRIPTION" will be present? will it be at the 15th line, 10th line above the "Thread" pattern? Also I've to capture the first "SESSION DESCRIPTION" pattern when multiple are present. So how to decide the NUM for -B or is there a way to capture the first pattern encountered before the line grep-ed?

I hope I've explained properly!

-dips

RudiC · February 5, 2015, 8:28am

No. Not clear.

I don't think your grep pipe will result in what you post, as SESSION DESCRIPTION is in line 26 while Thread is in 38.

I'd propose to look into awk to get close to what you need.

Walter_Misar · February 5, 2015, 12:14pm

Just doing something like

grep "\(Thread\|SESSION DESCRIPTION\)" test.xml

may help, then dealing with a simpler stream containing only those lines.

dips_ag · February 6, 2015, 2:18am

Rudic & Walter, thanks for your time!

I was afraid that my explanation might seem like blabbering!!

But what Walter suggested, I think I can take up from there and work with that.

-dips

Don_Cragun · February 6, 2015, 2:45am

If I understand what you're trying to do, I think you want something like:

awk '/SESSION DESCRIPTION/,/Thread/' test.xml

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

Note, however, that this will print all lines from a line containing SESSION DESCRIPTION up through the next line containing Thread even if there is no line containing Thread after a line containing SESSION DESCRIPTION . It isn't clear to me if you want that or not. If you only want sets of lines that contain both strings, the code is slightly more complicated.

drl · February 7, 2015, 1:31pm

Hi.

I could not tell if only the 2 lines matching 2 patterns were desired, or if the entire section between those two lines was wanted. So here is a solution for both, using easily accessible grep-like utilities:

#!/usr/bin/env bash

# @(#) s1	Demonstrate search for previous secondary pattern, peg, cgrep.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C peg cgrep

FILE=${1-data1}

pl " Input data file $FILE, numbered:"
cat -n $FILE

p=$( basename $0 ) t1="$Revision: 1.11 $" v=${t1//[!0-9.]/}
[[ $# > 0 ]] && [[ "$1" =~ -version ]] &&  { echo "$p (local) $v" ; exit 0 ; }

pl " Results, secondary pattern and main pattern ONLY:"
peg -z '/header/' 'time' $FILE

pl " Results, cgrep, ENTIRE section, secondary pattern THROUGH main pattern:"
cgrep -w 'header' 'time' $FILE

exit 0
producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
peg (local) 3.10
cgrep ATT cgrep 8.15

-----
 Input data file data1, numbered:
     1	header 1
     2	a
     3	b
     4	time
     5	  c
     6		header 2
     7	d
     8	time
     9	e
    10	f
    11	 header 3
    12	time

-----
 Results, secondary pattern and main pattern ONLY:
**** (1) header 1
time
**** (6) 	header 2
time
**** (11)  header 3
time

-----
 Results, cgrep, ENTIRE section, secondary pattern THROUGH main pattern:
header 1
a
b
time
========================================
	header 2
d
time
========================================
 header 3
time

peg: http://www.cpan.org/authors/id/A/AD/ADAVIES/peg-3.14

cgrep: cgrep | SourceForge.net

Both codes have many more features than noted here. Regarding peg, because it is in perl, it can be taken to almost any platform that has perl. The cgrep code is c, which means that it must be compiled. I had no trouble compiling it on 32-bit and 64-bit systems with gcc.

Best wishes ... cheers, drl