extract data from a data matrix with filter criteria

ssshen · April 15, 2009, 8:10pm

Here is what old matrix look like,

  IDs            X1                       X2                   Y1                 Y2

10914061 -0.364613333 -0.362922333 0.001691 -0.450094667
10855062 0.845956333 0.860396667 0.014440333 1.483899333
10844119 -0.256424667 -0.397806 -0.141381333 -0.275729667
10857231 -0.048169667 -0.117945667 -0.069776 0.034550333
10918191 -0.020050333 -0.013577 0.006473333 -0.096175667
10909733 0.162614333 0.234994333 0.07238 0.182145
10808085 0.184618667 0.626846333 0.442227667 0.266913667
10896632 -0.10846 -0.074073333 0.034386667 -0.212201667
10842200 -0.355892 -0.080373333 0.275518667 -0.658731667
10820400 -0.039500333 0.142172333 0.181672667 -0.048522667
10790305 0.060801667 0.296146 0.235344333 0.352136333
10850793 0.192093667 0.016334333 -0.175759333 0.018589
10898454 0.266829667 0.251706333 -0.015123333 0.358631
10900279 -0.135841333 -0.208101 -0.072259667 -0.08459
10915745 -0.426376 -0.090368 0.336008 -0.50354
10879483 0.171624333 -0.022833667 -0.194458 0.414899333

extract rows when |X1| or |X2| or |Y1| or |Y2| >= 0.5 and make them a new matrix.

Thanks in advance!

jeffm · April 15, 2009, 9:23pm

Fairly easy in Tcl. I'm not sure how you'd go about doing it in a "standard" shell script like Ksh, Bash, etc.

Tcl is pretty good for this because all variables are both strings and "lists" (sorta like arrays).

#! /usr/bin/tclsh

# open the data file for reading.
set fid [open /path/to/matrix.data r]

# loop through all lines in the file.
while { ![eof $fid] } {

        gets $fid row

        # Row are: ID, X1, X2, Y1, Y2
        # We don't care if the ID row is higer than
        # .5, so don't check it.
        set data [lrange $row 1 end]

        #data = X!, X2, Y1, Y2.
        # so foreach d $data will check the individual elements of the list.
        foreach d $data {

                if { $d >= 0.5 } {
                        set newdata $row
                }
        }

        # if we have a match from above, then
        # spit out the entire row.
        if { $newdata != "" } {
                puts stdout $newdata
                set newdata ""
        }
}
# close the file.
close $fid

So using your data above, the script produces:

[jeffm@stalin:~] tclsh script.tcl
 X1 X2 Y1 Y2
10855062 0.845956333 0.860396667 0.014440333 1.483899333
10808085 0.184618667 0.626846333 0.442227667 0.266913667

I'm sure the Perl people have a better way of doing it.

ghostdog74 · April 15, 2009, 10:05pm

here's a partial solution in awk

awk '$2>=0.5 || $3 >=0.5' file

this only does X1 or X2. i leave it to you to do Y1 and Y2.

ssshen · April 15, 2009, 10:15pm

but what I meant is absolute value >=0.5, thanks!

ssshen · April 16, 2009, 12:00am

the filter based on absolute value, X1>=0.5 and X1<=0.5.

Thanks

summer_cherry · April 16, 2009, 5:26am

awk '$2 >= 0.5 || $3 >= 0.5 || $4 >= 0.5 || $5 >= 0.5' filename

ssshen · April 16, 2009, 9:06am

Thank you folks so much! But it didn't work for me with both awk or Tcl. By the way, I wanted to filter absolute value. Second, I use Mac OS 10. Third The file is much bigger than the sample part.

radoulov · April 16, 2009, 10:38am

perl -lane'
  print if grep $_ >= 0.5, @F[1..@F]
  ' infile