Delete records within a file upon a condition

jacobs.smith · February 28, 2013, 4:47pm

Hi Friends,

I have the following file,

cat input

chr1 1000 2000
chr1 600 699
chr1 701 1000
chr1 600 1710
chr2 900 1800

Now, I would like to see the difference of

Record1.Col2 - Record2.Col2
Record1.Col2 - Record2.Col3
Record1.Col3 - Record2.Col2
Record1.Col3 - Record2.Col3

So each record's col2 and col3 values being subtracted from all records' col2 and col3 within the file after matching the first column.

If the difference is 300, remove the matched records.

Now, my output will be

cat output
chr1 600 699
chr2 900 1800

Script Flowchart:

Each record against each record will give me

chr1 1000 2000 chr1 600 699
chr1 1000 2000 chr1 701 1000
chr1 1000 2000 chr1 600 1710
chr1 1000 2000 chr2 900 1800

Now, if we do the subtraction between

$2-$5, $2-$6, $3-$5 and $3-$6

, after matching on column1 and column4

chr1 1000 2000 chr1 600 (400&1400) 699(301&1301) - This one will qualify, because the values are greater than 300.
chr1 1000 2000 chr1 701(299&1299) 1000(0&1000) - This record should be deleted, because the values are less than 300.
chr1 1000 2000 chr1 600(400&1400) 1710(-710&290) - This is same as above. You can ignore the negative sign while calculating.
chr1 1000 2000 chr2 900 1800 - This one will remain because the col1 and col4 don't match.

If two records match, I would like to delete two records, not one.

Yoda · February 28, 2013, 5:12pm

awk ' NR == 1 {
                c1 = $2
                c2 = $3
                p  = $1
                next

} p == $1 {
                d1 = c1 - $2
                d2 = c1 - $3

                d3 = c2 - $2
                d4 = c2 - $3

                d1 = (d1 < 0)?d1*-1:d1
                d2 = (d2 < 0)?d2*-1:d2
                d3 = (d3 < 0)?d3*-1:d3
                d4 = (d4 < 0)?d4*-1:d4

                if ( d1 > 300 && d2 > 300 && d3 > 300 && d4 > 300 )
                        print
} p != $1 {
                c1 = $2
                c2 = $3
                p  = $1
                print
} ' input