AWK Help needed - Production Issue

diksha2207 · October 5, 2009, 12:20am

Hi

I have a cross reference file which contains 86000 records. The data is old number:new number. There are 100s of files where i need to search for old number and append corresponding new number (preceded by @) to the line containing old number. The files contain millions of records.
Currently I am using sed command as below:

  sed "/$v_s_replace_string/s/$/$v_s_new_string/" $v_s_file_name > tempfile1
  mv tempfile1 $v_s_file_name

v_s_replace_string = contains old number
and v_s_new_string = contains new number

I need to replace this sed command with and awk command as awk is faster than sed.

Please help.
Thanks

vidyadhar85 · October 5, 2009, 12:27am

its similar.. I don't think it will make any huge diff..

awk -v old=$v_s_new_string  -v old=$v_s_replace_string  '{gsub(old,new)}1' $v_s_file_name > tempfile1
mv tempfile1 $v_s_file_name

diksha2207 · October 5, 2009, 12:34am

Hi
Thanks for your solution.

I am trying that.
Is there any other option that you can suggest. I need to speed up the processing as much as possible. It is a production run and my script should not take more than 5 mins per file.

Thanks

daptal · October 5, 2009, 12:42am

Did you try command line perl or a perl script .

diksha2207 · October 5, 2009, 1:20am

Hi

For the awk command... could you please modify it to append the new number at the end of the line where it finds a match with old number. The command you gave is adding the new number to begining of the line. I am new to awk. Please help.

vidyadhar85 · October 5, 2009, 1:34am

you mean this??

awk -v var=$v_s_replace_string  '{if($0 ~ '/$v_s_new_string'/}{print $0":"var}else{print}}' $v_s_file_name > tempfile1
mv tempfile1 $v_s_file_name

diksha2207 · October 5, 2009, 1:47am

Hi
Thanks for your help and time. But the code you sent is not helping.
It is appending all the new number strings to one record only.

I need to appedn the new number only to that record where it matches the corresponding old number.

I tried this:

awk -v var=$v_s_new_string  /$v_s_replace_string'/{print $0""var}' $v_s_file_name > tempfile1
mv tempfile1 $v_s_file_name

Please help.

ripat · October 5, 2009, 2:46am

I would be so sure about that. At least for a simple string replacement.

Awk is a better all-round tool for data manipulation but for a simple string replacement, sed is at least as fast.

diksha2207 · October 5, 2009, 2:56am

I need to do that simple string replacement at least a million times in one file and i have over 500 files to process. So i need to use awk...

ripat · October 5, 2009, 3:26am

I am not trying to advocate the use of sed as I think that awk is a better all-round tool but I think you underestimate sed when it comes to a simple string replacement.

I just did a little test on a string replacement in a big file (more than a million lines). Here are the results:

$ time awk '{gsub("B1058","zzz1058")}1' ventes_all > /dev/null
real	0m3.730s
user	0m3.648s
sys	0m0.040s

$ time sed 's/B1058/zzz1058/g' ventes_all > /dev/null
real	0m2.855s
user	0m2.828s
sys	0m0.028s

$ wc -l ventes_all 
1205794 ventes_all