Two or more occurrences of grep

cokedude · October 7, 2024, 10:26pm

Is there a way to do this with two or more occurrences with grep? If not grep is there a way to do it with awk, perl, or sed?

grep -i "EXEC .* CURSOR for" *

edit

This is what I am seeing. I have such a high volume of data I only want two or more occurrences of the data.

grep -i "EXEC .* CURSOR for" *
file1:EXEC SQL DECLARE name CURSOR FOR
file2:EXEC SQL DECLARE name CURSOR FOR
file3:   EXEC SQL DECLARE name1 CURSOR FOR State;
file3:   EXEC SQL DECLARE name2 CURSOR FOR State;

Because of the volume that I am seeing I only care about when there are two or more occurrences like this.

file3:   EXEC SQL DECLARE name1 CURSOR FOR State;
file3:   EXEC SQL DECLARE name2 CURSOR FOR State;

munkeHoller · October 7, 2024, 10:29pm

@cokedude , not really clear what you are attempting., so to avoid/minimise confusion/misinterpretation

Show a re presentable example of data that you are grepping in
Show the expected results from your greps (that may need to be be manually typed)

tks

NB: learn to use the triple backticks to mark code from prose please or use the menu option to do that for you, i've edited your post to include them

Paul_Pedant · October 7, 2024, 11:04pm

Do you mean "List all files that contain two or more lines that match a pattern, ignoring differences in upper-lower case" ?

Awk can count. Something like:

awk '
toupper ($0) ~ ".*EXEC .* CURSOR FOR.*" { ++n[FILENAME]; }
END { for (f in n) 
    if (n[f] >= 2) printf ("File %s has %d matches\n", f, n[f]); }
' *

munkeHoller · October 8, 2024, 12:09am

not sure if that's what they want as they've not responded (yet),
anyway here's a stab at it using 2 greps ....

grep -Hci "EXEC .* CURSOR for" *|grep -vE ':(0|1)$'
example.test:4
ex.test:2
gotOne.test:2
gotSome.test:4
x.test:12

munkeHoller · October 8, 2024, 6:26pm

@cokedude , you can edit your post now , though its (probably) preferable that you simply respond with an updated post on the same thread, that way context is maintained.

cokedude · October 8, 2024, 8:38pm

Adding the requested clarification.

This is what I am seeing. I have such a high volume of data I only want two or more occurrences of the data.

grep -i "EXEC .* CURSOR for" *
file1:EXEC SQL DECLARE name CURSOR FOR
file2:EXEC SQL DECLARE name CURSOR FOR
file3:   EXEC SQL DECLARE name1 CURSOR FOR State;
file3:   EXEC SQL DECLARE name2 CURSOR FOR State;

Because of the volume that I am seeing I only care about when there are two or more occurrences like this.

file3:   EXEC SQL DECLARE name1 CURSOR FOR State;
file3:   EXEC SQL DECLARE name2 CURSOR FOR State;

munkeHoller · October 8, 2024, 9:14pm

@cokedude , what criteria does my suggestion not meet ?
what happens when 2 or more files have 2 or more matching entries ?

MadeInGermany · October 9, 2024, 6:35am

The previous suggestions print the match count.
The following prints the filename then the matching lines.

awk '
FILENAME != pFN { if (n >= 2) printf "%s:\n%s", pFN, s; n=0; s=""; pFN=FILENAME }
toupper ($0) ~ /EXEC .* CURSOR FOR/ { ++n; s=(s $0 "\n") }
END { if (n >= 2) printf "%s:\n%s", pFN, s }
' *

Filename in front of the matching lines, like grep does it:

awk '
FILENAME != pFN { if (n >= 2) printf "%s", s; n=0; s=""; pFN=FILENAME }
toupper ($0) ~ /EXEC .* CURSOR FOR/ { ++n; s=(s FILENAME ":" $0 RS) }
END { if (n >= 2) printf "%s", s }
' *

Matt-Kita · October 9, 2024, 7:04am

Please mind, that you're searching for a matching pattern across multiple files. So given that the files will be searched through in a particular order (filename sorting order may depend on your locale settings), do you want to get all files with two or more matches (across all files), or do you only want the first found matching file (if more than one file contains 2 or more matches)?

#!/usr/bin/env bash
for filename in ./*; do
  [[ ! -f "$filename" ]] && continue;
  if (( $(grep -i -c 'EXEC .* CURSOR for' "$filename") >= 2 )); then
    grep -iHm 2 'EXEC .* CURSOR for' "$filename";
    # this will stop the execution after first matching file
    break;
  fi;
done

Also, is it first two occurrences you're interested in?
grep -iHm 2 'EXEC .* CURSOR for' "$filename"

or last two occurrences?
grep -iH 'EXEC .* CURSOR for' "$filename" | tail -n 2

or randomly selected two occurrences (if there are more than two)?
grep -iH 'EXEC .* CURSOR for' "$filename" | shuf -n 2

or do you want to see all "2 or more" occurrences??
grep -iH 'EXEC .* CURSOR for' "$filename"

???

Paul_Pedant · October 9, 2024, 7:25am

It would be helpful to know which flavour of awk you have available. GNU/awk has patterns for BEGINFILE and ENDFILE, which would considerably simplify a solution here.

cokedude · October 9, 2024, 5:53pm

Sorry I forgot to specify I am using an old school sunos. Is there a way I can add that to my signature?

It does not work because I am using an old sunos.

 grep -Hci "EXEC .* CURSOR for" *|grep -vE ':(0|1)$'
grep: illegal option -- E
Usage: grep [-c|-l|-q] -bhinsvw pattern file . . .
grep: illegal option -- H
Usage: grep [-c|-l|-q] -bhinsvw pattern file . . .

cokedude · October 9, 2024, 6:03pm

This works :). awk is one of the few things that usually works in sunos.

awk '
toupper ($0) ~ ".*EXEC .* CURSOR FOR.*" { ++n[FILENAME]; }
END { for (f in n)
    if (n[f] >= 2) printf ("File %s has %d matches\n", f, n[f]); }
' *
File file1 has 2 matches
File file2 has 3 matches
File file3 has 3 matches

This even works with find if you want the full path :).

find /export/home/user5 -type f -exec awk '
toupper ($0) ~ ".*EXEC .* CURSOR FOR.*" { ++n[FILENAME]; }
END { for (f in n)
    if (n[f] >= 2) printf ("File %s has %d matches\n", f, n[f]); }
' {} + 2>/dev/null

cokedude · October 9, 2024, 6:58pm

MadeInGermany:

awk '
FILENAME != pFN { if (n >= 2) printf "%s:\n%s", pFN, s; n=0; s=""; pFN=FILENAME }
toupper ($0) ~ /EXEC .* CURSOR FOR/ { ++n; s=(s $0 "\n") }
END { if (n >= 2) printf "%s:\n%s", pFN, s }
' *

I really like the first one :). Its really easy to read. With what I am looking being code it is tabbed so it easy see.

How do you come up with this? When grep does not work I am not sure where I should go. Always have trouble deciding between awk, sed, and perl. awk does usually work best with sunos. Then I have no idea what I am trying to do or what I need to do to accomplish my goal. Did this take you long to come up with?

cokedude · October 9, 2024, 7:06pm

Yes I would like all files with two or more matches.

Sorry I forgot to specify I am using an old school sunos.

: two_occurrences.sh
grep: illegal option -- H
grep: illegal option -- m
Usage: grep [-c|-l|-q] -bhinsvw pattern file . . .

cokedude · October 9, 2024, 7:13pm

I do not have the GNU/awk. Not sure how to check the version. I am using some old school sunos version. The man pages mention both oawk or nawk.

HISTORY
       The  oawk  command  was installed as /usr/bin/awk in all Sun and Oracle
       releases of Solaris up through  Oracle  Solaris  11.4.32.  Starting  in
       11.4.33,  that  version  of awk was moved to /usr/bin/oawk, and the awk
       pkg(7)  mediator  was  created  to  allow  sites  to   choose   whether
       /usr/bin/awk links to oawk or nawk.

Oracle Solaris 11.4               29 Nov 2022                           awk(1)

awk --version
awk: unknown option --version ignored
awk: no program given

awk -v
awk: no program given

MadeInGermany · October 9, 2024, 9:15pm

Good, after decades Oracle finally allows to switch /usr/bin/awk to nawk

(While other Unix vendors switched with their next major OS release.)
Doing so there is no risk, but a big advantage.

Of course, you could also run the POSIX versions in /usr/xpg4/bin/ - not only awk but also grep and sed.

One reading of each file with awk is efficient, only half of the I/O, compared to two greps.

How did I make the awk code? First I realized that it runs file by file, so it is sufficient to count and store the matches in simple variables. And at the end of a file print the store s if the count n is >= 2. Missing an ENDFILE (only GNU awk) I use two conditions: when the FILENAME has changed and at the very END (after the last file).
When the FILENAME has changed then also reset the count and store, and change the pFN (previous FILENAME), so another change of the FILENAME can be detected.

Paul_Pedant · October 10, 2024, 11:39am

To be fair, nawk was available on SunOs/Solaris much earlier -- somewhere around 1995, I think. Certainly before Oracle acquired Solaris. It was annoying that you needed to explicitly invoke nawk (or set up your own links), but it worked.

The only bug I remember was that it would SegViol if you passed it an untyped argument and treated it as an array. My code had a lot of split ("", X, ""); calls to fix this.

Versioning was not available. I think the acid test to find out if you had awk or nawk was to have a tiny pre-test that invoked asort() on a single array entry. If that worked, you had nawk. If it wrote to stderr, you only had old awk.

MadeInGermany · October 10, 2024, 12:46pm

asort() is only in GNU awk.?

AFAIR oawk does not understand a boolean expression:
oawk '1'
gives an error. While a POSIX awk or nawk sees a "true" and does an implicit { print }

Paul_Pedant · October 10, 2024, 4:01pm

I like the boolean test. My bad: there was something that I used, but not asort(). It might have been the $99 limit on field numbering. Possibly even the function keyword: seems to me that if you wanted a control break on change of FILENAME and on END, you had to repeat (and maintain) all of the associated code.

I have a lot of archives, but nothing before about 2000. I wrote a HeapSort for small internal sorts in nawk, and an external sort command plus readback with getline() for larger sorts.

MadeInGermany · October 10, 2024, 7:00pm

According to

the Solaris awk=oawk does not have sub(), gsub(), toupper(), tolower(), close(). And doesn't know the function keyword for custom functions.
BTW my script with a function that avoids the repeated printing code, plus a parameter for the search string:

awk -v usearch="EXEC .* CURSOR FOR" '
function prt(){  if (n >= 2) printf "%s:\n%s", pFN, s }
FILENAME != pFN { prt(); n=0; s=""; pFN=FILENAME }
toupper($0) ~ usearch { ++n; s=(s $0 "\n") }
END { prt() }
' *