grep on string and printing line after until another string has been found

Hello Everyone,

I just started scripting this week. I have no background in programming or scripting.

I'm working on a script to grep for a variable in a log file

Heres what the log file looks like. The x's are all random clutter

xxxxxxxxxxxxxxxxxxxxx START: xxxxxxxxxxxx ACTNUMBER=1234xxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxx FINISH:

I declared ACTNUMBER as a variable $ACTNUMBER

Basically the user will open a file and grep for the ACTNUMBER.
I've gotten as far as getting the file open with exec.

What I want the script to do is to grep for the ACTNUMBER and only print the line (with echo) if the line contains a START and the ACTNUMBER. I also want the script to echo all the data in the lines after until it reaches the word FINISH

I'm looking for the results in a numbered format output to display, with a carriage return between the results.

There's a twist though. Sometimes theres a nested START and FINISH within a START and FINISH and I need to have the script look for that as well. So it it finds two STARTS then find two ENDS. So if it finds one START, look for another START, if it finds another START look for two FINISH's.

I know grep only does one line at a time,
so if i do a grep -n START | grep -n ACTNUMBER=$ACTNUMBER
it will output lines with both the START and ACTNUMBER, and I need to get it to print all the lines after until it finds the line that read FINISH.

So what I need it to do is look for a START and ACTNUMBER. If it finds it then echo that particular line. Then search next line, even if it doesnt find it, to print that line, and the next, and if it reaches a FINISH, print the line, then start loop over.

I have a feeling some counters might be involved here but I'm really not sure how to implement this.

Thanks so much for any input and ideas on this.

For your sample data this should work:

awk -vact="$ACTNUMBER" '$0~"START.*ACTNUMBER="act,/FINISH/' logfile

As for the nested START/FINISH, please post example of such case and the desired output.

1 Like

grep is not really a good tool to use for this kind of thing. I suggest you look at either awk, perl or sed.

For example, here is one way of printing all lines between START/ACTNUMBER and FINISH using sed:

sed -n '/START:\(.*\)ACTNUMBER/,/FINISH:/p' file

It does not handle the nested case. For that you need to use a tool like awk or perl.

1 Like

Thanks..
There are multiple instances of START to FINISH with the ACTNUMBER within. This script needs to display all of those instances.

However some have another START and FINISH within the START and FINISH.

So sometimes, it will look like this
xxxxxx START xxxxxxxxxxxxxxxx ACTNUMBER=1234 xxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxx FINISH xxxxxxxxxxxxxxxxxxxxxxxxxx

and sometimes there will be this.....

xxxxxx START xxxxxxxxx START xxxxxxxx ACTNUMBER=1234 xxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx FINISHxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx FINISH xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

This is a long log file so there will be multiple instances of these log snippets then I need the script to display to output in a numbered format with a carriage return between each instance

Someone mentioned grep and using a counter to look for the first and second instance of START and the first and second instance of FINISH if there is one.

Basically what I'm looking for is
for it to find START , if theres a start look for the ACTNUMBER
print or echo that particular line
Look for the same on next line. if this is false, still print the line
Look for the same on next line. if this is false, still print the line
Look for FINISH. If this is true, print the line, then I need it to loop recursively.

For the past week I've been in the grep state of mind. I guess I was way off lol

grep -n START | grep -n ACTNUMBER=$ACTNUMBER gets me the START and ACTNUMBER but I just couldn't figure out how to get those lines to print until FINISH

---------- Post updated at 10:59 AM ---------- Previous update was at 10:57 AM ----------

Thanks.. what does the (.*\) mean? Does that mean everything between START and ACTNUMBER?

You still didn't post sample data with nested START/FINISH and the corresponding output.

The data within the nested START/FINISH is the ACTNUMBER=1234 and a bunch of random log information

the output if this script works should look like this....

START: xxxxxxxxxxxx START xxxxxxx ACTNUMBER=1234xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx FINISH xxxxxxxxxxxxxxxxxxxxxxxxxx FINISH
-----------------------------line space----------------------------
START: xxxxxxxxxxxx START xxxxxxx ACTNUMBER=1234xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx FINISH xxxxxxxxxxxxxxxxxxxxxxxxxx FINISH
------------------------line space---------------------------------------------
START: xxxxxxxxxxxx START xxxxxxx ACTNUMBER=1234xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxx FINISH xxxxxxxxxxxxxxxxxxxxxxxxxx FINISH

Sometimes for ACTNUMBER 1234 it may have once instance of just a START and FINISH with the ACTNUMBER within and sometimes it may be embedded with in a nested START/FINISH

So this script basically needs to filter out all the jibberish before the START and after THE FINISH.

This log has many different ACTNUMBERS. Some maybe 1234 another may be 4778 , and so on and so on.

The sample data within the nested START/FINISH will be the ACTNUMBER.

Thanks so much for your help so far. I hope this helped a little.

---------- Post updated at 01:07 PM ---------- Previous update was at 01:04 PM ----------

and if its not nested the output should look like this

START xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ACTNUMBER=1234 xxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx FINISH
-------------------line space --------------------------------------------------
START xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ACTNUMBER=1234 xxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx FINISH

FYI the begining of my code look like this

echo "Please enter a filename: "
read $FILENAME
echo "Please enter an act number: "
read $ACTNUMBER

Those are my two variables

For this sample data (with START/START and FINISH/FINISH being on the same line), code that was already posted will work:

awk -vact="$ACTNUMBER" '$0~"START.*ACTNUMBER="act,/FINISH/' $FILENAME

It will be more useful, if you tried that code on your real data and checked which parts are not processed correctly, then post example of those parts here.

Ok I'm on an iPhone right now responding..
I received a syntax error near line 1
Bailing out line 1
Then a bunch of errors like this:
String too long near line 259
New line in string near line 259
And so on for line 260, 261, 314, 315, 316, 385, 386,
417, 418, 466, 467

---------- Post updated at 05:35 PM ---------- Previous update was at 05:34 PM ----------

Also got a no such file or directory error

---------- Post updated at 05:49 PM ---------- Previous update was at 05:35 PM ----------

Ok I put a space before and after the ~ and getting a bunch of wc: cannot open errors as well as an ambiguous redirect error