Grab contents between two matched patterns

piynik · February 4, 2013, 9:53am

I am wanting to fetch the content of the table within a file

the table begins with data label like

    N   Batch    Mn(I)   RMSdev  I/rms  Rmerge    Number  Nrej Cm%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes  $$ $$
. #columns of data
.
.
.
.
.
$$

I tried the command

awk '/    N   Batch    Mn(I)   RMSdev  I/rms  Rmerge    Number  Nrej Cm%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes/, /\$\$/', file

but I am getting a syntax error. what have I done wrong?

Skrynesaver · February 4, 2013, 10:14am

perl -ne 'print if(/ N Batch Mn(I) RMSdev I\/rms Rmerge Number Nrej Cm%poss AnoCmp MaxRes CMlplc SmRmerge SmMaxRes/../\$\$/)' file

joeyg · February 4, 2013, 10:15am

What are you trying to read?
Perhaps show a sample line or two of the input file, and your desired output.

RudiC · February 4, 2013, 10:16am

I think you used sth like sed's address range which awk doesn't recognize. Try

awk '/1.pattern/ {on=1}
     /2.pattern/ {on=0}
     on
    ' file

piynik · February 4, 2013, 10:27am

Here is parts of what the table looks like

    N   Batch    Mn(I)   RMSdev  I/rms  Rmerge    Number  Nrej Cm%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes  $$ $$
    1       1    148.4     32.3   4.59   0.147       168     0     0.3      -     3.5   0.00    0.066     3.17
    2       2    526.8     80.3   6.56   0.062       808     0     2.1      -     3.0   0.02    0.066     3.17
    3       3    478.7     57.5   8.32   0.064      1020     0     4.1     0.0    3.2   0.04    0.066     3.17
$$

basically I want to fetch all those numbers from the table from a file and perform further awk operation downstream

Corona688 · February 4, 2013, 11:39am

awk '$1=="N" { P=1; next } $1="$$" { P=0 } P' inputfile

piynik · February 5, 2013, 3:44am

Hi RudiC and Corona,

your command returns nothing to screen.

RudiC · February 5, 2013, 4:22am

Did you replace "1.pattern" with a reasonable string to identify the start of your table, and "2.pattern" with sth. to recognize its end? Post the command you applied!

piynik · February 5, 2013, 4:46am

The table looks like this, I want to grab those data and perform further awk operation (if column 6 greater than certain value then output column 2)

    N   Batch    Mn(I)   RMSdev  I/rms  Rmerge    Number  Nrej Cm%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes  $$ $$
    1       1    148.4     32.3   4.59   0.147       168     0     0.3      -     3.5   0.00    0.066     3.17
    2       2    526.8     80.3   6.56   0.062       808     0     2.1      -     3.0   0.02    0.066     3.17
    3       3    478.7     57.5   8.32   0.064      1020     0     4.1     0.0    3.2   0.04    0.066     3.17
$$
    N   Batch    Mn(I)   RMSdev  I/rms  Rmerge    Number  Nrej Cm%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRe

I tried command

awk '/N   Batch    Mn\(I\)   RMSdev  I\/rms  Rmerge    Number  Nrej Cm\%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes  \$\$ \$\$/ {on=1} /N   Batch    Mn\(I\)   RMSdev  I\/rms  Rmerge    Number  Nrej Cm\%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes/ {on=0} on' myfile

it returned empty

If I tried

awk '/N   Batch    Mn\(I\)   RMSdev  I\/rms  Rmerge    Number  Nrej Cm\%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes  \$\$ \$\$/, /N   Batch    Mn\(I\)   RMSdev  I\/rms  Rmerge    Number  Nrej Cm\%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes/' myfile

it returned a single line but no data

N   Batch    Mn(I)   RMSdev  I/rms  Rmerge    Number  Nrej Cm%poss  AnoCmp MaxRes CMlplc SmRmerge SmMaxRes  $$ $$

I just have no clue why it wouldn't work

RudiC · February 5, 2013, 5:52am

Your awk statement does exactly what you tell it to do:
If "pattern $$ $$" is matched, switch "on" to TRUE (= 1).
If "pattern" is matched, switch "on" to FALSE (= 0).
If "on" == TRUE, print the actual line.

The problem you encounter is that you switch on and off in adjacent actions on the same input line, so the script has no chance to print anything. So the task resolves to scrutinize the data to find patterns that can't be intermixed to switch printing on and off AND carefully formulate regexes that uniquely identify either.

BTW - my earlier statement that awk does not recognize sed like address ranges was wrong - apologies for that. So your original ansatz was OK in priciple (syntax error mayhap due to the comma after the awk program) but suffered from above - find the right patterns. Try this:

$ awk '/N   Batch  --- looooong pattern ---  \$\$ \$\$/, /^\$\$/' file

You might want to abbreviate the long pattern by replacing non-relevant part with wildcads like .* .

piynik · February 5, 2013, 6:13am

your code worked! many thanks for that. Perhaps I do not truly understand how the syntax

awk '/pattern1/, /pattern2/' file

works could you explain why adding the ^ character has made the difference? is the {on=1} and {on=0} implicit in that command line?

also would it be possible to not print the N Batch...... line in the output?

RudiC · February 5, 2013, 6:29am

1) T'was the same error you encountered before: $$ was present at the end of the starting pattern and thus closed the address range immediately, printing exactly that one line. Adding the "^" told awk to look for $$ as the first chars in the line, making it ignore those at the end and thus extending the range.

2) I would not phrase it like so, but yes, you could imagine implicit ons and offs in that command.

3) Yes, by carefully rearranging the sequence of commands in the awk program:

...
/pattern2/ {on=0}
on
/pattern1/ {on=1}
...

piynik · February 5, 2013, 6:46am

Why does that work? (what does it do?)

and why doesn't it work when you type it in one single line?

RudiC · February 5, 2013, 7:05am

You want to leave out the switching line - so switch off before print and switch on after print.
I don't see a reason why it should not work on a single line except that awk might have difficulties and needed some help to separate the pattern {action} pairs.

Scrutinizer · February 5, 2013, 7:23am

That was probably because Corona's suggestion needed another =-sign:

awk '$1=="N" { P=1; next } $1=="$$" { P=0 } P' file

piynik · February 5, 2013, 9:20am

the only way I could make sense of why that works is awk prints out whatever line that are between the two patterns, without much directionality.

/pattern2/ {on=0} on

when pattern2 is found, store the lines but do not print, is that correct?

 /pattern1/ {on=1}

when pattern1 is found turn on print. But pattern 1 occurs before pattern 2 in the table. So what is it printing? I think awk works its way down the table as it executes the command, so that code doesn't make much logical sense to me. Or does it not work like that?

RudiC · February 6, 2013, 4:36am

piynik:

the only way I could make sense of why that works is awk prints out whatever line that are between the two patterns, without much directionality.
/pattern2/ {on=0} on
when pattern2 is found, store the lines but do not print, is that correct?

No. See below.

Yes.

Please try diving in deeper into awk by reading man pages or other literature. awk in very general terms does the following: execute the BEGIN action (if exists), then read input line after line, and apply the program steps given sequentially to each line. Program steps consist of pattern {action} pairs, in priciple. Whenever a pattern evaluates to TRUE, execute the action. The default action is {print}. So in your case

...                # read line
/pattern2/ {on=0}  # does closing pattern occur in the line --> store THE FACT (implies not printing the closing line)
on                 # default print if on (on == TRUE), don't if FALSE
/pattern1/ {on=1}  # does opening pattern occur in the line --> store THE FACT (implies not having printed the opening line)
...

So, as you can see, by carefully arranging the steps you can tailor the output to your needs.

BTW - you could, of course, append the down-the-line awk processing that you mentioned before into the above awk program...

piynik · February 6, 2013, 5:49am

After much deliberation I can begin to make sense of the code

My final question is, as Awk reads in the first line of input and execute the following commands for the first line

.../pattern2/ {on=0}on/pattern1/ {on=1}...

What is the default value of

on

as it has not been initialised? As it reads the first line of input pattern2 is not matched so

{on=0}

is not executed, then how does Awk evaluate

on

(no initialised value?) before it goes down to

/pattern1/ {on=1}

which will be true?

I did try to dive into manuals and online resources but I read little stuffs here and little stuffs there it is difficult to get a complete picture.

But many thanks for your help.

RudiC · February 6, 2013, 5:59am

For awk, uninitialized variables are 0 or "". This was implicitly taken into account by my proposal.