This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up?
cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt"; done
Note that each file will only contain one line of the original file, namely the last occurence of a particular combination of those fields. If there can be more than one line with that combination you need to use >> instead of > , but you would need to empty the files beforehand...
NR==1 likely isn't a good test since the files will change possibly with each line. perhaps a[f]++ to test if it's in an array of used filenames. and yes, some awk only allow 10 filedesc!
The original script did result in 3 coming before 4. Having it the other way is preferable, but I figured at the time that it was something I could live with.
I was just using the output file name from the originally supplied code. I cannot say how it was originally set.
What do you think of the performance issues for a large input? Would my code by horribly slower? If so, then the original poster must make the decision of speed over clarity (assuming that my suggestion is clear, and I'm not sure if it is)
Regards,
Robin
Hang on, no I have totally misread the supplied attempt.
If I re-read it, the code is generating multiple output files based on the input records and writing the whole line to the appropriate file.
No, forget my suggestion, totally wrong.
I can't think of a way to remove the "open/append, write, close" operations to write to multiple files unless we force is another way by working out what files there could possibly be and then getting the records for each required output file in turn. That would just generate more headaches than it solves and for a large input file could still be quite slow.
Am am a fool :o
Robin
In shell the open/append, write, close operations are performed implicitly by the scope of the redirection of the file descriptor. RudiC already gave a suggestion using an array.
#!/bin/bash
declare -A AF
sort Tax_Provision_Sample.dat |
while IFS="~" read p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
do
fn="$p2~$p4~$p3~$p8~$p9"
if [ -z "${AF[$fn]}" ]; then
> $fn.txt
AF[$fn]=1
fi
echo "$p1~$p2~$p3~$p4~$p5~$p6~$p7~$p8~$p9~$p10" >> $fn.txt
done
I have added some bash-4 code that will empty the output files when first met.
Omit the code if you have bash-3 (and delete/empty the files before you run the script).