awk help to make my work faster

kumar_amit · December 25, 2008, 2:17am

hii everyone ,
i have a file in which i have line numbers.. file name is file1.txt
aa bb cc "12" qw
xx yy zz "23" we
bb qw we "123249" jh
here 12,23,123249. is the line number
now according to this line numbers we have to print lines from other file named file2.txt
the file2.txt consist of more than 600000(6 lakhs ) of record.
i have written code as
while read line; do
x=`echo $line|awk '{print $4}'`
m=`echo "${x}"|sed 's/"//g'"`
awk '{if(NR=='$m') {print $0>>"desriredfile"}}' file2.txt
done<file1.txt

this command works.. but the thing is that this is slow ..as it read whole file every time for each line number.
can a code be written so that it reads the whole file (file2.txt) only once ?
thanks for help??

matrixmadhan · December 25, 2008, 2:48am

0.6 M don't seem to be a big number ( I guess )
load everything to memory with a map as

line_no => line_no_contents

then do a lookup with line_no, its done then

matrixmadhan · December 25, 2008, 2:55am

Here is an example on how to do that in perl, please tweak according to the needs

#! /opt/third-party/bin/perl

# map line number and contents in the file

my %fileHash = ();
my $lfh;
open($lfh, "<", "file_1") or die "Unable to open file : file_1 <$!>\n";

while ( <$lfh> ) {
  chomp;
  $fileHash{$.} = $_;
}

close($lfh);

# open file that contains line numbers for which data needs to be extracted from the other file 

open($lfh, "<", "file_2") or die "Unable to open file : file_2 <$!>\n";

while( <$lfh> ) {
  chomp;
  print "Here is the information " , $fileHash{$_} , "\n";
}

close($lfh);

exit(0)

kumar_amit · December 25, 2008, 3:09am

well i need a shell script for this... i have no idea about pearl.. how much time will it take in pearl.. it takes about more than one day.. to complete. its high time for me know ... i should find some idea ...

matrixmadhan · December 25, 2008, 3:17am

0.6M really is not a big number ( I hope so )

just give it a try and you can find the time taken for this to run.

May be try benchmarking with 100K records with your shell script method and the above perl version will help you to identify at the overall time that this could take ( but again, its an approximation )

Let us know if you are struck somewhere

danmero · December 25, 2008, 11:27am

replace with

x=`echo $line|awk '{print int($4)}'`

and will save you few hours 2 system calls per $line
Post some ample data using [code] tags and maybe we can speed up your script. The output of wc for each file can be useful.

kumar_amit · December 25, 2008, 11:35am

but can't something be done inside awk . so that it reads the bigger file once and give the desired output..
here it is reading the file again and again after getting the line number

thanks
amit:b:

rubin · December 25, 2008, 1:04pm

awk -F'"' 'NR==FNR { a[$2]; next } FNR in a { print > "desired_file" }' file1.txt file2.txt

kumar_amit · December 25, 2008, 1:23pm

this code just totally beyond my reach... can u please elaborate:confused:

kumar_amit · December 25, 2008, 1:29pm

could not get the code

rubin · December 25, 2008, 2:31pm

Here goes one thread that is similar to your situation, with explanations.

There are many more examples like this one. Use the search feature.

kumar_amit · December 26, 2008, 2:19am

awk 'NR==FNR{a[$1];next} FNR in a{print $0>>"desired_file"}' file1 file2

where:-
file1 consist of all the line number ONLY and
file2 consist of file from which the has to be made.

it hardly takes in seconds to do it..

special thanks to rubin for this

THIS IS ONE OF THE BEST COMMUNITY FOR UNIX LEARNER WITH SO MANY REPLIES COMMING IN WITHIN MINUTES:b:

regards
amit