Reducing file lines in awk

vasanth.vadalur · November 18, 2011, 10:26pm

Hi,

Here i have to check first record $3 $4 with second record $1 $2 respectively. If match found, then check first record $2 == second record $4 , if it equals , then reduce two records to single record like as desired output.

Input_file

desired output file:

1 1 4 1 
3 1 3 2

agama · November 18, 2011, 11:38pm

I think this does what you are looking for:

awk '
    {
        if( NR > 1 )
        {
            split( $0, b, " " );
            if( b[1] == a[3] && b[2] == a[4]  && a[2] == b[4] )
            {
                b[1] = a[1];
                b[2] = a[2];
            }
            else
                printf( "%s %s %s %s\n", a[1], a[2], a[3], a[4] );

            for( i = 1; i <5; i ++ )
                a = b;
        }
        else
            split( $0, a, " " );
    }

    END {
        printf( "%s %s %s %s\n", a[1], a[2], a[3], a[4] );
    }
' input-file

Might be possible to refine it, but off the top of my head the output from your sample matches what you posted as desired.

vasanth.vadalur · November 19, 2011, 5:09am

Hi,

Thanks.

With Same Logic...
For the below input file

 
1 2 3 4       
1.275 3 1.325 3 
1.275 3 1.225 3.025 
1.325 3 1.375 3
1.375 3 1.425 3 
1.425 3 1.475 3 
1.475 3 1.525 3
1.525 3 1.575 3
1.625 3 1.575 3 
1.625 3 1.675 3 
1.675 3 1.725 3 
1.725 3 1.775 3 
1.775 3 1.825 3 
1.825 3 1.875 3 
1.875 3 1.925 3

Expected output

1 2 3 4    
1.275 3 1.925 3 
1.275 3 1.225 3.025

But output got is

1 2 3 4
1.275 3 1.325 3
1.275 3 1.225 3.025
1.325 3 1.575 3
1.625 3 1.575 3
1.625 3 1.925 3

Still repeats are there.

Where went wrong...

:wall:

---------- Post updated at 02:09 AM ---------- Previous update was at 12:36 AM ----------

Hi,

Since it is becoming confusing algorithm.

I have changed my algorithm to,

if $2==$4 add extra column as $5 which us a value of $2.

Find min.of $1 and maximum of $3 .

And final output will be,

Min.$1 $com.value max.$3 $com.value

agama · November 19, 2011, 11:50am

vasanth.vadalur:

Expected output
1 2 3 4    
1.275 3 1.925 3 
1.275 3 1.225 3.025 
But output got is
1 2 3 4
1.275 3 1.325 3       
1.275 3 1.225 3.025
1.325 3 1.575 3       
1.625 3 1.575 3
1.625 3 1.925 3
 
Still repeats are there.

Where went wrong...

Well, actually it didn't go wrong. Your original post indicated that only sequential lines in the file need to be tested, and I inferred that the 'new line' was to be matched against the next line in the file if there was a match. The programme is doing exactly this and the output you see is expected given those parameters.

Thinking on the minimum/maximum redefinition of the problem.

---------- Post updated at 11:50 ---------- Previous update was at 11:30 ----------

I'm not as confident in this as I don't know what combinations fields 2 and 4 might take. I've made an assumption based on your example and this does work for it, but there might be other unexpected results. Have a go with this and see how it does:

awk '
    {
        idx = $2 "," $4;
        if( min[idx] == ""  ||  min[idx] > $1+0 ) 
             min[idx] = $1+0; 
        if( max[idx] == "" || max[idx] < $3+0 ) 
             max[idx] = $3+0; 
    }

    END {
        for( x in min )
        {
            split( x, a, "," );
            printf( "%.3f %.3f %.3f %.3f\n", min[x], a[1], max[x], a[2] );
        }
    }
'  input-file