Delete lines

kekaes · May 13, 2010, 6:59am

Hello.

I've a file with 17 column. I need to filter this file. For example, if the data in column number 2, is smaller than 4, or it's a word, the programa must delete the line.

Input:

abc            523              2              5       .....
dfghy         0                 54             26     .......
poir            aca             12             -5    ......... 
locid           52             158             -23    .........

Output:

abc            523              2              5       .....
locid           52             158             -23    .........

Thanks

panyam · May 13, 2010, 7:07am

 
awk '$2 <4 || $2 ~/[a-z]/ { next } 1'  input_file

kekaes · May 13, 2010, 7:21am

Ok. Thanks, it is just what I needed . Can you explain to me the significance of "{ next } 1"?

panyam · May 13, 2010, 7:33am

 
 if the data in column number 2, is smaller than 4, or it's a word, go to  "next" line ( ignore the current line )

danmero · May 13, 2010, 7:50am

Shorter

awk '$2>3 && $2==int($2)' file

durden_tyler · May 13, 2010, 8:46am

Or using Perl -

$
$ cat f5
abc            523            2              5
dfghy          0              54             26
poir           aca            12             -5
locid          52             158            -23
xyzw           pq9s           82             95
$
$
$ perl -ane '$F[1]>=4 && print' f5
abc            523            2              5
locid          52             158            -23
$

tyler_durden

alister · May 13, 2010, 12:03pm

The following may be irrelevant, but I point it out as a heads up, just in case.

That expression will match non-numeric data that's not alphabetical. If that is undesirable, I would recommend either using a complemented decimal digit class, [^0-9], or going with danmero's approach.

$ cat input
abc            523              2              5       .....
dfghy         0                 54             26     .......
poir            4abc             12             -5    ......... 
foo            4&^%$#@             12             -5    ......... 
locid           52             158             -23    .........
$ awk '$2 <4 || $2 ~/[a-z]/ { next } 1' input
abc            523              2              5       .....
foo            4&^%$#@             12             -5    ......... 
locid           52             158             -23    .........

Also, durden_tyler's perl solution is a bit more accepting:

$ perl -ane '$F[1]>=4 && print' input
abc            523              2              5       .....
poir            4abc             12             -5    ......... 
foo            4&^%$#@             12             -5    ......... 
locid           52             158             -23    .........

Regards,
Alister

---------- Post updated at 12:03 PM ---------- Previous update was at 11:49 AM ----------

I do not recommend using the following in production, as it's not immediately obvious what it does, but I share it in the spirit of AWK golf.

Dedicated to danmero ;):

awk '($2==$2+0)*$2>3' file

For AWK novices, the parenthetical tests whether or not the value in $2 is a non-numeric string. $2+0 forces a conversion to a numeric value. When the result is compared to $2 with ==, if $2 is a numeric value, then two idential numbers are compared and the result of the comparision is 1. Multiplying $2 by 1 doesn't alter the result of the comparison with 3.

However, if $2 is a non-numeric string, the result of $2+0 (a number) will not be identical to $2 (a string), because the conversion process will have discarded the non-numeric portions of the string. When AWK compares the two with ==, it will not be comparing two numbers. When one value in a comparison is a non-numeric string, the other is converted (if necessary) to a string before the comparison. In this case, the result is a comparison between two unidentical strings, which yields the number 0. 0>$3 is false and the line is skipped.

This approach allows floating point values in $2, while the int() approach does not. I only mention this for the sake of being thorough in this description, not to tout it as an advantage.

If nothing else, that AWK snippet makes for a good lesson in AWK type conversion

Cheers,
Alister

kekaes · May 14, 2010, 4:13am

Ok. Thank you very much. Only one question more.

if I do following:

awk '$1 !~/[a-z]/ { next } 1 '  file_input  | awk '$2>90 || $2 !~/^[0-9]/ { next } 1 | awk '$3 <-360 || $3>360 || $3 !~/^[0-9]/ { next } 1' ... > file_output

Is there any way that I write to a file lines that have been eliminated in other file?

panyam · May 14, 2010, 4:39am

Some thing like this: ?

 
diff output_file input_file | grep "^<" | sed 's/^< //g'