i've got a number of csv files (e.g. 1.csv, 2.csv ...) in a directory to process them.
I need to read them fast and using nawk, to carry out reg expression matching, i would then output those files into their respective files (e.g. 1.csv.out, 2.csv.out).
in my codes:
while ((getline< "/workspace/folder/1.csv")>0)
regex="abx"
if ($6 ~ regex)
{
print $1 $3 $4 >> "/workspace/folderout/1.csv.out"
}
it works when the file name is hard coded, for loop doesnt seems to work.
i had:
"for FILE in /workspace/folder*.csv" before the begin
If i/o (hard drive access, for example) is the bottleneck, and if all the files are on the same device, then your simultaneous instances of the script will be contending for the same limited i/o resource, which will likely result in a drop in performance.
Regardless of the bottleneck, using find with exec-+ will involve much less process creation -- which can be expensive if your files are many and a substantial portion of the work if they are also small -- thereby speeding things up (perhaps substantially).