Finding when a file switches direction using awk

bflinchum · August 1, 2011, 11:31pm

Hi guys I have a file that is filled with x,y values:

0.000000 0.00129578
0.000191 0.00272187
0.000381 0.0125676
0.000572 0.0120014
0.000763 0.00203461
0.000954 0.00682248
0.001144 0.00202773
0.001335 0.000840523
0.001526 0.00451419

....5MB of that

I wanted to know if there is a way to modify this awk statement to find out when column two switches directions. What i mean is the number in column two will get bigger then start to get smaller. When this happens I want to print out the value of column one.

I used this to find the maximum value:

#awk ' { if ( $2 >= .9998 && $2 <= 1.0001 ) print $1 }' fileName.xy

I want to do something like this:

#awk ' { if ( [1 Row above $2]  >= $2 && [1 Row below $2] <= $2 ) print $1 }' fileName.xy

any ideas how to actually code/specify [1 Row above $2] and [1 Row below $2]?

agama · August 1, 2011, 11:51pm

This will print out the change point rows:

awk '
    NR == 1 { last = $2; last_row = $0; next; }
    {
        if( $2 < last )
        {
            printf( "%s\n", last_row );
            printf( "%s\n", $0 );
            exit( 0 );
        }

        last = $2;
        last_row = $0;
    }
' inputfile

If the input file is:

The output from the programme will be:

4 10
5 8

I think this is what you had in mind. It will handle floating point numbers as well, my simple test was just that: simple.

rdcwayx · August 2, 2011, 1:10am

awk 'NR==1{last=$2;last_row=$1;getline;cur=$2;cur_row=$1;next} 
     {if (last>=cur&&$2<=cur) print cur_row; 
      last=cur;last_row=cur_row;cur=$2;cur_row=$1}' infile

bflinchum · August 2, 2011, 3:12pm

Niether of them worked; the problem was it seemed to pick random points not the peaks (I plotted them using GMT).

Instead of accepting defeat can I trouble you for a little explanation, if I understood the logic a little better I think I could tweak it. I am still new at awk but my professor loves it, and the more I use it the more I realize how powerful it actually is.

awk 'NR==1{last=$2;lost_row=$1;getline;cur=$2;cur_row=$1;next}

This is the part I don't quite understand. What exactly does NR == 1 accomplish? Your setting the number of rows equal to one? does a ';' seperate commands? So you set last equal to column 2, lost_row equal to column one.
Why do your run a getline? I looked it up and it is defined as:
getline - returns 1 if it finds a record, and 0 if the end of the file is encountered

Anyways I get the feeling that $1 and $2 are now rows above and below not columns?

  {if (last>=cur&&$2<=cur) print cur_row; 
      last=cur;last_row=cur_row;cur=$2;cur_row=$1}' infile

This part logically makes sense, and this is what I am trying to accomplish.

alister · August 2, 2011, 3:36pm

Some special variables that AWK sets for each record (usually a line) that it reads:

NR = record number (by default the record separator, RS, is a newline, so NR is often the current line number). NR==1 {...} means to execute the commands in braces if this is the first record that AWK is processing.

NF = after field splitting, the number of fields (columns) in the current record.

$1 = the value of the first field

$2 = the value of the second field

$NF = the value of the last field.

Regards,
Alister

bflinchum · August 2, 2011, 4:20pm

So that makes sense. I played around with it and have a data set that looks like:

 1	2	3	4	5	6	7
12	4	5	0	8	5	4
12	98	7	675	98	89	98

if I:

awk 'NR==3 {print $2}' file1.txt

I get 98. if I make NR == 2 i get 4.

So is there a way to check run an if statement with NR-1?

I want to compare the value directly above and below it?

awk ' { if ( [NR-1, $2]  <= $2 && [NR+1, $2] <= $2 ) print $1 }' fileName.xy

I need to specify that location make sense? there is a numeric value one row above (NR-1) and in field 2 (which I abbreviated as [NR-1,$2] ) that I want to compare to the current row NR and field 2 (abbreviated $2).

is there a way to specify that in awk?

Once again so grateful for the responses!

rdcwayx · August 2, 2011, 8:56pm

In your request, you need three lines compare with, so need record the previous two lines each time.

NR==1{last=$2;last_row=$1;getline;cur=$2;cur_row=$1;next} will read the first two lines. getline will jump to next line easily within one line code.

If I write by this way, you will easily understand

NR==1{last=$2;last_row=$1} 
NR==2{cur=$2;cur_row=$1}

The rest code will start from line 3 which you already understood.

Test from your sample data, my code is fine. If there are any problems, you need paste some real data for test.

bflinchum · August 2, 2011, 9:24pm

Ok i tinkered with it and it makes sense now! Thank you so much. I think I will use that trick quite a bit.