extract data from file

apalex · April 25, 2002, 4:01pm

My file in ksh consists of message data of varying lengths (lines), separated with headers.
I would like to find a string from this file, and print out the whole message data including the headers.

my plan of attack is to search the strings, print the top header, and print the whole message data below it.

i can't figure out the perfect awk command. i am creating this
situation for my own tool in data extraction. any ideas is
greatly appreciated.

$infile:

REPT header aaa 111
data1 data2 string1 string2
REPT header bbb 222
string1 data1
string2 data2 data3 data4
REPT header ccc 333
data1 data2 data3 data4
REPT header aaa 111
data1 data2 data3 data4
data5 data6
REPT header ddd 444
string1 string2
data1
data2

Look for: string1 and string2

$outfile:

REPT header aaa 111
data1 data2 string1 string2

REPT header bbb 222
string1 data1
string2 data2 data3 data4

REPT header ddd 444
string1 string2
data1
data2

Kelam_Magnus · April 26, 2002, 3:23pm

I think that this link will be very enlightening.

I would suggest using this and then grep * on all the new files to get what you want as a workaround until you find a script that will work for you.

Hope this helps...

system · April 27, 2002, 1:24pm

The following awk script watches for string1 and string2 as it stores each line of a message in an array, and at end of message, outputs the array if both strings were found. I took the lazy way and just matched on the entire line, which means that it would also find string1 or string2 even if embedded. If you want to locate string1 and string2 only if they are whole words, a slight change would be needed.

Since the last message does not have a following REPT line to trigger end-of-message processing, I have to call printmsg at END, and that is why I put that code in a function. Some awk versions will not support "function". You might have to use /usr/xpg4/bin/awk instead. Or we can always eliminate it as a function and just put the printmsg logic in both places. It prints a blank line at end of each message, including the last one.

#!/bin/sh
awk '\
function printmsg() {
if (flag1==1 && flag2==1)
     {for (l=1;l<=lcnt;l++)
          print lines[l]
          print ""} }
{if ($1=="REPT")
   {printmsg()
    split("",lines)
    lines[1]=$0
    lcnt=1
    flag1=0
    flag2=0
    next}
 lines[++lcnt]=$0
 if (match($0,"string1"))
     flag1=1
 if (match($0,"string2"))
    flag2=1
}
END {printmsg()}' $infile > $outfile
exit 0