take a section of a data with conditions

saeed.soltani · August 25, 2012, 5:20pm

I have a data file like below:
[input]

 
2011 0701 2015 21.2 L 37.692 46.202 18.0 Teh 4 0.3 2.1 LTeh 1
GAP=233 E
Iranian Seismological Center, Institute of Geophysics, University of Tehran 6
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO SNR AR TRES W DIS CAZ7
TBZ SN EPg 0 2015 31.19 -0.3 60.0 355
BST SZ EPg 0 2015 31.30 -0.3 61.0 89

2011 0702 0624 39.4 L 38.067 46.391 13.9 Teh 5 0.1 1.7 LTeh 1
GAP=157 E
Iranian Seismological Center, Institute of Geophysics, University of Tehran 6
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO SNR AR TRES W DIS CAZ7
SHB SZ EPg 0 0624 51.37 0.0 72.0 290
MRD SZ EPg 0 0624 54.83 0.1 94.0 320

i want to retain sections which have these constrains:
in the 1st field of each section (each section seprate with a blank line) which starts with "2011", if $12>3 then print the whole section. so i wrote this command with awk:

 
awk '/^201?/ {if ($12 > 3) {RS="";FS="\n"} print $0}' in > out

but the result doesn't have any changes with input!

agama · August 25, 2012, 6:42pm

I think this does what you might be looking for:

awk '
    $1 == "#"  { if( snarf ) print " "; snarf = 0; next; }   # turn off section capture, write a trailing blank line
    snarf || (/^201?/ && $12+0 > 3.0) { snarf = 1; print; }  # print a record from the section
    ' input >output

You said "blank line" but that seems to be a line with a lone hash (comment symbol) at the start. I assumed you wanted all lines from the 2011 line (with a value in field 12 greater than three, up to the next 'blank' line printed.

Don_Cragun · August 26, 2012, 1:05am

agama said he found a # by itself on a line as the section separator. When I copied the sample input and fed it through od -cb , I found that the separator line contained the octal byte values 343, 200, and 200 terminated by the <newline> character.

I believe the following meets the criteria specified, but nothing will be printed given the sample input because no section header in the sample input has $12 > 3.

awk 'BEGIN {line1 = 1} # Next line with no alpha-numeric is a section header.
!/[0-9a-zA-Z]/ { # Found what is assumed to be a blank line.
        # The sample input had three bytes with octal values 343, 200, and 200
        #   followed by a <newline> as the separator between sections.
        #   The submitter described this as a "blank line".
        # This script will use empty lines as section separators no matter what
        #   section separator lines are found in input files.
        copy = 0 # Turn off copy mode.
        line1 = 1 # The next non-"blank" line is a sectoin header.
        next
}
copy    {print;next} # Copy any lines found before the next "blank" line.
line1   {if(($1 ~ /^2011/) && ($12 > 3)) { 
                # The text in the first post in this thread said sections were
                #   to be printed only for the year 2011 and $12 is > 3.
                # The script in the first post was looking for years 2010-2019.
                # All entries in the sample input were for 2011, but no entries
                #   had $12 > 3 (the only entries had $12 set to 2.1 and 1.7,
                #   so no entries match the criteria.
                copy=1 # Turn on copy mode for the rest of the section.
                # Add an empty line as a section separator, except before the 1st
                #   section to be printed.
                if(found++ > 0) print ""
                print # Print the 1st line of the section.
        } 
        # Whether a match was found or not, don't look for another seciion
        #   header until we find another separator line.
        line1 = 0
}' input

pravin27 · August 26, 2012, 1:18am

How abut this ?

#!/usr/bin/perl

$/="\n\n";

while (<DATA>) {
chomp;
if ( ((split))[11] > 3 ) {
print ;
}
}



__DATA__
2011 0701 2015 21.2 L 37.692 46.202 18.0 Teh 4 0.3 2.1 LTeh 1
GAP=233 E
Iranian Seismological Center, Institute of Geophysics, University of Tehran 6
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO SNR AR TRES W DIS CAZ7
TBZ SN EPg 0 2015 31.19 -0.3 60.0 355
BST SZ EPg 0 2015 31.30 -0.3 61.0 89

2011 0702 0624 39.4 L 38.067 46.391 13.9 Teh 5 0.1 3.7 LTeh 1
GAP=157 E
Iranian Seismological Center, Institute of Geophysics, University of Tehran 6
STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO SNR AR TRES W DIS CAZ7
SHB SZ EPg 0 0624 51.37 0.0 72.0 290
MRD SZ EPg 0 0624 54.83 0.1 94.0 320