GREP -p: first TWO lines

Hello all. I have a flat text file, separated into paragraphs. I need to grep for all paragraphs containing a specific term (Flash, in this case), and first line in each paragraph containing that term, along with the line immediately preceding the first occurence.

Example paragraph:

Some text, that could contain anything
Flash 1
Flash 2
Flash 3

I need to get:

Some text, that could contain anything
Flash 1

Is there a set of flags for 'grep -p', or other commands that I can use? The file has almost ten thousand paragraphs, and only 3,000 contain 'Flash'. I REALLY don't want to search by hand.

How are we supposed to identify the start of a new paragraph?

what about

grep -p -B 1 Flash | head -2

If your paragraphs are separated by blank lines then you could do something like:

awk '/Flash/ {
   for (i=1;i<=NF;i++) {
      if ($i ~ /Flash/) {
         if (i>1) {
            printf $(i-1) OFS
         }
         print $i;
         break
      }
   }
}' FS="\n" OFS="\n" RS="" ORS="\n\n" file

AIX' grep has the (non-standard, i believe) option "-p", which searches for "paragraphs" - groups of lines separated by blank lines, unless a "paragraph separator" is specified.

AIX man page of grep

bakunin

1 Like

Since awk can separate on paragraphs by setting RS to "", how about this:

awk '$1 ~ R { print $1; print $2 }' FS="\n" RS="" R="mysearchstring" inputfile

My apologies for not getting back to you guys. I've had back surgery, so won't be able to try this for a few days.

sed 's/BadBack/GoodBack/g' :slight_smile:

My paragraphs are separated by blank lines. My apologies for not making that clear. I am sorry. The name of my file is "CodeSnippets", so should my code look like this?:

awk '/Flash/ {
   for (i=1;i<=NF;i++) {
      if ($i ~ /Flash/) {
         if (i>1) {
            printf $(i-1) OFS
         }
         print $i;
         break
      }
   }
}' FS="\n" OFS="\n" RS="" ORS="\n\n" CodeSnippets

---------- Post updated at 10:40 AM ---------- Previous update was at 10:40 AM ----------

If only it had been that simple. :wink:

---------- Post updated at 10:48 AM ---------- Previous update was at 10:40 AM ----------

This returned the first two lines of all of the paragraphs, instead of just the ones I was looking for.

Suppose that CodeSnippets contained:

Some text, that could contain anything
Flash 1
Flash 2
Flash 3

2nd paragraph with key
word not seen until
the fourth line
Flash 4

Flash on line 1 and line 4
other text on subsequent lines
Do we get what we want
Flash again
after 2nd flash

Reason not to use input %s %d
as 1st arg to printf Flash

With this input, the code above produces something like:

Some text, that could contain anything
Flash 1

the fourth line
Flash 4

Flash on line 1 and line 4

awk: not enough args in printf(Reason not to use input %s %d
)
 input record number 4, file CodeSnippets
 source line number 5

Assuming that the output you want from the above sample input is:

Some text, that could contain anything
Flash 1

2nd paragraph with key
word not seen until

Flash on line 1 and line 4
other text on subsequent lines

Reason not to use input %s %d
as 1st arg to printf Flash

you might want to try something like:

awk '
BEGIN { FS = "\n"
        RS = ""
}
/Flash/ {
        printf("%s%s\n%s\n", first++ ? "\n" : "", $1, $2)
}' CodeSnippets