awk changes to make it faster

mirwasim · May 16, 2013, 2:54am

I have script like below, who is picking number from one file and and searching in another file, and printing output.

Bu is is very slow to be run on huge file.can we modify it with awk

#! /bin/ksh
while read line1
do
echo "$line1"
a=`echo $line1`
if [ $a -ge 0 ]
then
echo "$num"
cat file1|nawk -v "c=$line1" '$1 ~ c' >> message.log
fi
done < file2

.

file2 has below data

cat file2
1234
5678
9100
1324

file 1 has string which contain this data in that.

Jotne · May 16, 2013, 3:05am

Can you post some of file 1 and example on how you like the output to be.

mirwasim · May 16, 2013, 3:08am

first four line should be printed, last two should be ignored


1234,0130020036210801,61400900240,144.135.15.1,50501,8,9550,106A,177200093,144.135.15.212,telstra.internet,mnc001.mcc505.gprs,33,10.237.103.20,0,1,0,0,0,2413,36436,20121002232313,115914,
5678,0124300042019104,61432228629,149.135.133.97,50501,8,6550,2120,80847635,144.135.14.68,telstra.internet,mnc001.mcc505.gprs,33,10.195.135.22,0,1,0,0,0,1962,19782,20121002234954,116855,
9100,0131760091070905,61427989363,149.135.131.65,50501,8,3950,4557,83767434,144.135.14.67,telstra.internet,mnc001.mcc505.gprs,33,100.82.223.99,0,1,0,0,0,235,3324,20121002233018,117271,0,
1324,3524240501157114,61427252411,149.135.133.97,50501,8,A050,0D9E,178201226,144.135.15.212,telstra.internet,mnc001.mcc505.gprs,33,10.239.140.179,0,1,0,0,0,2288,48700,20121002231512,1171
2222,0131760091070905,61427989363,149.135.131.65,50501,8,3950,4557,83767434,144.135.14.67,telstra.internet,mnc001.mcc505.gprs,33,100.82.223.99,0,1,0,0,0,235,3324,20121002233018,117271,0,
2154,3524240501157114,61427252411,149.135.133.97,50501,8,A050,0D9E,178201226,144.135.15.212,telstra.internet,mnc001.mcc505.gprs,33,10.239.140.179,0,1,0,0,0,2288,48700,20121002231512,1171

Jotne · May 16, 2013, 3:28am

Try this

awk -F, 'NR==FNR{a[$0];next} $1 in a' file2 file1

1234,0130020036210801,61400900240,144.135.15.1,50501,8,9550,106A,177200093,144.135.15.212,telstra.internet,mnc001.mcc505.gprs,33,10.237.103.20,0,1,0,0,0,2413,36436,20121002232313,115914,
5678,0124300042019104,61432228629,149.135.133.97,50501,8,6550,2120,80847635,144.135.14.68,telstra.internet,mnc001.mcc505.gprs,33,10.195.135.22,0,1,0,0,0,1962,19782,20121002234954,116855,
9100,0131760091070905,61427989363,149.135.131.65,50501,8,3950,4557,83767434,144.135.14.67,telstra.internet,mnc001.mcc505.gprs,33,100.82.223.99,0,1,0,0,0,235,3324,20121002233018,117271,0,
1324,3524240501157114,61427252411,149.135.133.97,50501,8,A050,0D9E,178201226,144.135.15.212,telstra.internet,mnc001.mcc505.gprs,33,10.239.140.179,0,1,0,0,0,2288,48700,20121002231512,1171

MadeInGermany · May 16, 2013, 4:17am

Jotne assumes the keys are first in file1, separated by comma.
nawk wants ($1 in a) .

Jotne · May 16, 2013, 4:26am

So you say that it should be?

awk -F, 'NR==FNR{a[$0];next} ($1 in a)' file2 file1

Gives same result.

I do understand it like this:
Print lines from file1 if its staring with one of the numbers listed in file2

mirwasim · May 16, 2013, 4:34am

Thank you so much, its working great