GREP function in ksh which ignores LINE Breaks

Raghav_Garg · September 19, 2014, 4:03pm

Hello

I am using a grep command with two patterns in my KSH script. File has line breaks in it and both the patterns are in different lines. Here is the command grep -l 'RITE AID.*ST.820' natriter820u.20140914

Pattern1 - RITE AID
Pattern2 - ST*820

I am not getting any results from this, whereas if I replace this file with a different file having no line breaks then it works. Is there any work around to deal with line breaks here.

Here is the sample data:-

ISA*00* 00 *ZZ*NATIONSBANK *14*0030020520500 *140918*1200*U*00401*000006436*0*P*^ 
GS*RA*014578892*IGIHCJEEJ*20140918*1200*6442*X*004010
ST*820*000006482 
N1*PR*RITE AID HDQTRS. CORP.*1*014578892

Can someone please help.

Thanks

RudiC · September 19, 2014, 4:21pm

Your grep command will not print the sample's filename as it is looking for "RITE AID" coming in front of "ST.820" which will not be the case even with newlines removed.

However, grep does not match patterns across line boundaries, so either remove those (e.g. tr -d '\n' ) or use other methods, e.g. awk to match either pattern and print the FILENAME when both are found.

Corona688 · September 19, 2014, 4:23pm

grep does not work that way. grep matches lines containing patterns, it does no logic like 'if this line and this line do this thing or this other thing' etc. It's not a programming language.

awk is a programming language, and can.

$ awk -v P1="pattern1" -v P2="pattern2" '
# set A if P1 found, set B if P2 found
$0~P1{A=1} $0~P2{B=1} 
# If filename changes, and A set, and B set, print filename.  Reset A and B.
(L != ARGIND) { L++; if(A && B) print ARGV[L];  A=B=0 }
# Check A and B for the last filename and print.
END { if(A&&B) print ARGV[L] }' filename1 filename2 filename3 filename4

Raghav_Garg · September 19, 2014, 4:24pm

Thanks Rudic, I will correct the first mistake here also I am very new to Unix, will it be possible for you to provide me the with the right command that you think should work with tr -d '\n'

RudiC · September 19, 2014, 4:30pm

tr -d '\n' <file | grep ST*820.*RITE AID

Raghav_Garg · September 19, 2014, 4:37pm

corona688:

grep does not work that way. grep matches lines containing patterns, it does no logic like 'if this line and this line do this thing or this other thing' etc. It's not a programming language.

awk is a programming language, and can.
$ awk -v P1="pattern1" -v P2="pattern2" '
# set A if P1 found, set B if P2 found
$0~P1{A=1} $0~P2{B=1} 
# If filename changes, and A set, and B set, print filename.  Reset A and B.
(L != ARGIND) { L++; if(A && B) print ARGV[L];  A=B=0 }
# Check A and B for the last filename and print.
END { if(A&&B) print ARGV[L] }' filename1 filename2 filename3 filename4

Thanks, I will try and see if it works.. my company is using a pretty old version so I have a limited number of possibilities

---------- Post updated at 03:37 PM ---------- Previous update was at 03:33 PM ----------

Thanks a lot, I tried

 tr -d '\n' <natriter820u.20140914 | grep 'ST.820.*RITE AID'

and it gave me the content of the file but I need to get the file name so I tried
tr -d '\n' <natriter820u.20140914 | grep -l 'ST.820.*RITE AID'

It is giving a weird output of <stdin>

Corona688 · September 19, 2014, 4:41pm

These days, even a wireless router probably has awk. If you're running anything with Linux or UNIX in its name you should have it.

grep prints '<stdin>' because you didn't give it a file name, it was reading from tr instead, through a pipe, also called 'standard input'. So that's not quite what you want.

grep usually has limits on how long a line it will process, and tr -d '\n' turns it into one giant line, so that's not a good solution anyway.

There are ways with grep, involving calling grep multiple times, and possibly sorting and merging its output. I think the awk way is the closest to what you asked for.

drl · September 20, 2014, 10:06am

Hi.

A versatile member of the grep family, the non-standard cgrep, allows matches across newlines (among many other extended features, such as extracting windows of lines around matches). For example:

#!/usr/bin/env bash

# @(#) s1	Demonstrate match across lines, cgrep.
# For cgrep source, see:
# http://sourceforge.net/projects/cgrep/

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C cgrep

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results, cgrep for ST*82 to RITE AID across lines:"
cgrep -l -a 'ST\*82.*\n.*RITE AID' $FILE

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
cgrep ATT cgrep 8.15

-----
 Input data file data1:
ISA*00* 00 *ZZ*NATIONSBANK *14*0030020520500 *140918*1200*U*00401*000006436*0*P*^ 
GS*RA*014578892*IGIHCJEEJ*20140918*1200*6442*X*004010
ST*820*000006482 
N1*PR*RITE AID HDQTRS. CORP.*1*014578892

-----
 Results, cgrep for ST*82 to RITE AID across lines:
data1

The cgrep code usually needs to be obtained and compiled. I have done so on 32-bit and 64-bit systems without trouble. See the script comment for the source URL.

Best wishes ... cheers, drl

Scrutinizer · September 20, 2014, 10:23am

Try:

awk 'FNR==1{p=1} /ST\*820/{p=0} !p && /RITE AID/{print FILENAME}' file(s)

--
On Solaris use /usr/xpg4/bin/awk rather than awk

drl · September 20, 2014, 4:00pm

Hi.

For portability (but not performance), there is a perl version of grep, peg, that has many features, including the ability to use perl expressions and functions. The near functions allow one to look backwards from a line which matched a pattern, effectively matching across lines:

#!/usr/bin/env bash

# @(#) s2	Demonstrate match across lines, peg.
# For peg source, see:
# # http://www.cpan.org/authors/id/A/AD/ADAVIES/peg-3.10

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C peg

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results, peg for RITE AID back to ST*82 across lines:"
peg -l '/RITE AID/ and near(sub{/ST\*82/},-1)' $FILE

exit 0

producing:

$ ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
peg (local) 3.10

-----
 Input data file data1:
ISA*00* 00 *ZZ*NATIONSBANK *14*0030020520500 *140918*1200*U*00401*000006436*0*P*^ 
GS*RA*014578892*IGIHCJEEJ*20140918*1200*6442*X*004010
ST*820*000006482 
N1*PR*RITE AID HDQTRS. CORP.*1*014578892

-----
 Results, peg for RITE AID back to ST*82 across lines:
data1

The peg code has many of the same basic options as GNU grep, in addition to a number of extensions. For basic work on systems that do not have GNU grep it seems to be an acceptable substitute in many situations. The comment in the shell script points to the code source.

Best wishes ... cheers, drl

Raghav_Garg · September 22, 2014, 11:12am

I am using this command:- awk 'FNR==1{p=1} /ST\*820/{p=0} !p && /RITE AID/{print FILENAME}' natriter820u.20140914 but getting an error.
awk: syntax error near line 1
awk: bailing out near line 1

I am very new to this stuff so if you can please suggest what I am doing wrong here then it will be very helpful

Corona688 · September 22, 2014, 11:16am

I don't suppose you tried my code too?

Raghav_Garg · September 22, 2014, 11:21am

I did and with same syntax error, I think either I am missing something basic here or it is my system limitations.
Here is what I tried:-

awk -v P1="ST.820" -v P2="RITE AID"
# set A if P1 found, set B if P2 found
0~P1{A=1} $0~P2{B=1} 
# If filename changes, and A set, and B set, print filename.  Reset A and B.
(L != ARGIND) { L++; if(A && B) print ARGV[L];  A=B=0 }
# Check A and B for the last filename and print.
END { if(A&&B) print ARGV[L] }' /edi/editst/archive/sterling/in/*

and I got the same syntax error

Corona688 · September 22, 2014, 11:26am

You are leaving out the single quotes, which totally changes the meaning of anything you do.

Put the single quotes back in and try again.

Raghav_Garg · September 22, 2014, 12:02pm

I did that as well but same result.

#!/bin/ksh

awk -v P1="ST.820" -v P2="RITE AID" '
# set A if P1 found, set B if P2 found
0~P1{A=1} $0~P2{B=1} 
# If filename changes, and A set, and B set, print filename.  Reset A and B.
(L != ARGIND) { L++; if(A && B) print ARGV[L];  A=B=0 }
# Check A and B for the last filename and print.
END { if(A&&B) print ARGV[L] }' /edi/editst/archive/sterling/in/*

drl · September 22, 2014, 12:47pm

Hi.

Given the characterisitcs of this problem, a 2-step grep can be used, but the grep needs to be able collect 1 line before and after matched lines. For example, with GNU grep:

#!/usr/bin/env bash

# @(#) s3	Demonstrate match across lines, grep.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C grep

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results, grep for RITE AID back to ST*82 across lines:"
if grep -F -B1 'RITE AID' $FILE |
grep -q -F -A1 'ST*82'
then
  printf "$FILE\n"
fi

FILE=data3
pl " Input data file $FILE:"
cat $FILE

pl " Results, grep for RITE AID back to ST*82 across lines:"
if grep -F -B1 'RITE AID' $FILE |
grep -q -F -A1 'ST*82'
then
  printf "$FILE\n"
fi

exit 0

producing:

$ ./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
grep GNU grep 2.5.3

-----
 Input data file data1:
ISA*00* 00 *ZZ*NATIONSBANK *14*0030020520500 *140918*1200*U*00401*000006436*0*P*^ 
GS*RA*014578892*IGIHCJEEJ*20140918*1200*6442*X*004010
ST*820*000006482 
N1*PR*RITE AID HDQTRS. CORP.*1*014578892

-----
 Results, grep for RITE AID back to ST*82 across lines:
data1

-----
 Input data file data3:
ISA*00* 00 *ZZ*NATIONSBANK *14*0030020520500 *140918*1200*U*00401*000006436*0*P*^ 
GS*RA*014578892*IGIHCJEEJ*20140918*1200*6442*X*004010
ST*820*000006482 
not a matching line
N1*PR*RITE AID HDQTRS. CORP.*1*014578892

-----
 Results, grep for RITE AID back to ST*82 across lines:

The way this works is that for each line that matches the second pattern, we capture it and the preceding line. Those pairs of lines are piped into a second grep, which lists the pairs only if the first pattern occurs.

Note that the second data file, data2, has an intervening line between the two patterns of interest, which should disqualify that file from being listed and it does. I'm sure it's possible to construct pathological cases where this fails, but then the sample provided would not have been representative.

I have not tested this extensively, but it seems to make sense for the problem at hand.

Best wishes ... cheers, drl

Corona688 · September 22, 2014, 1:15pm

Considering the result last time should have been various syntax errors, I have my doubts.

Try nawk if you're on solaris.

Raghav_Garg · September 22, 2014, 1:48pm

This works without any error but with no output, I need to have the file name from this process so that I can use them in the next set of process.

#!/bin/ksh


nawk -v P1="ST.820" -v P2="RITE AID" '
# set A if P1 found, set B if P2 found
0~P1{A=1} $0~P2{B=1} 
# If filename changes, and A set, and B set, print filename.  Reset A and B.
(L != ARGIND) { L++; if(A && B) print ARGV[L];  A=B=0 }
# Check A and B for the last filename and print.
END { if(A&&B) print ARGV[L] }' /edi/editst/archive/sterling/in/*

Corona688 · September 22, 2014, 2:09pm

You also left a $ out of the program which totally changed its meaning.

It works fine when I use the entire unmodified program:

$ awk -v P1="ST.820" -v P2="RITE AID" '
# set A if P1 found, set B if P2 found
$0~P1{A=1} $0~P2{B=1}
# If filename changes, and A set, and B set, print filename.  Reset A and B.
(L != ARGIND) { L++; if(A && B) print ARGV[L];  A=B=0 }
# Check A and B for the last filename and print.
END { if(A&&B) print ARGV[L] }' data1

data1

$

Also, please use code tags for code, not icode.

Raghav_Garg · September 22, 2014, 2:36pm

corona688:

You also left a $ out of the program which totally changed its meaning.

It works fine when I use the entire unmodified program:
$ awk -v P1="ST.820" -v P2="RITE AID" '
# set A if P1 found, set B if P2 found
$0~P1{A=1} $0~P2{B=1}
# If filename changes, and A set, and B set, print filename.  Reset A and B.
(L != ARGIND) { L++; if(A && B) print ARGV[L];  A=B=0 }
# Check A and B for the last filename and print.
END { if(A&&B) print ARGV[L] }' data1

data1

$
Also, please use code tags for code, not icode.

I am running what you just suggested and also its nawk version but getting an error :- -v: not found

I know that I am making this tough for all the people helping me around.
Much Thanks to everyone who has contributed to this thread.