awk to parse huge files

Hello All,

I have a situation as below:

(1) Read a source file (a single file of 1.2 million rows in it )
(2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file.

I tried as below: ( please note I am not posting the complete code and just a sue-do code )

awk -F"|" 'NR==FNR { array[$1]=$2;next } {gsub('fields in dest file',array[field positions in dest file]),$0 } 
source_file dest_files*.dat  

The flaw in the above code is , irrespective of whether there is a matching string or not , the row is getting printed and performance is also not good.

Any suggestions would be appreciated.

Regards,
Ravi

When needing to process a huge amount of data, it would be advisable to use what has been designed for doing such kind of task : a database :wink:

Without the complete code, and without any of the actual data it's working on, we cannot possibly tell you

1) why it's slow
2) why it's not working.

If you post the complete code, and some of the data you're working from, we might be able to

1) speed it up
2)make it work.

...but we can only do wild guessing right now.

Hi All,

Thanks for the reply.

I got the issue resolved myself and forgot to update here.

The logic I used is:

awk -F"|"  'BEGIN{ read the source file and store in array} { for each record in dest file search and replace it with source data ( stored in array ) and save it to a file }' source_file dest_files*.dat

Please post the actual code for the solution, the pseudocode for either is not useful and it'd be nice for this thread to have any point at all.