awk to print range of fields

krishnix · December 6, 2011, 11:34am

Hi

file.in and file.out are in csv format. the code I have now is,

cat file.in | awk -F"," '!($1$2$3$4$5$6$7$8 in a){a[$1$2$3$4$5$6$7$8];print $0}' > file.out

Here, I am printing entire line using $0. however, I want to print $1 to $150 and it should be in csv format. Cut -d is not good in performace. Using for loop within awk is not producing output as expected.

file.in

hi,there,how,are,you,123
hi,there,how,are,you,124
hi,there,how,are,you,125
hi,there,how,are,you,126

file.out (please note i have reduced no.of columns for convinience, i need 150 columns)

hi,there,how,are
hi,there,how,are
hi,there,how,are
hi,there,how,are

Thanks in advance
K

Corona688 · December 6, 2011, 11:50am

That's a useless use of cat. If you're worried about performance, that should be the first thing to go.

cut's performance ought to better than awk's, being a more specialized tool. This is exactly the sort of problem cut was made for. If you use it in a silly way, like running it once per line, it will of course run slowly, but so would awk.

cut -d "," -f 1-150 < input > output

krishnix · December 6, 2011, 11:58am

Hi Corona,

cat file.in | awk -F"," '!($1$2$3$4$5$6$7$8 in a){a[$1$2$3$4$5$6$7$8];print $0}' | cut -d, -f1-150  > file.out

I use the above code, ie., using cut within awk. Do you think its effective way of using it like that.

Thanks, K.

Corona688 · December 6, 2011, 12:03pm

You're still doing that useless use of cat.

Getting rid of that cat will probably save you about enough CPU power to pay for that entire cut.

The code otherwise looks good. I don't think there's an efficient way to specify entire ranges of fields in awk itself.

<file.in awk -F"," '!($1$2$3$4$5$6$7$8 in a){a[$1$2$3$4$5$6$7$8];print $0}' | cut -d, -f1-150  >file.out