Parallelize a task that have for

Dear all,
I'm a newbie in programming and I would like to know if it is possible to parallelize the script:

for l in {1..1000}
do
cut -f$l quase2 |tr "\n" "," |sed 's/$/\
/g' |sed '/^$/d' >a_$l.t
done

I tried:

for l in {1..1000}
do
cut -f$l quase2 |tr "\n" "," |sed 's/$/\
/g' |sed '/^$/d' >a_$l.t &
done

But it was showed a message "fork: Resource temporarily unavailable"

Is it possible to do something? I need to do this because the number of columns of the real file is too long. So, the work it will be impossible to do in the current way.

Thanks in advance

You seem to have run out of resource (system...) what kind of box is this yaou are working on? It looks like you will have to do some kernel tuning... Cant say more not knowing the OS more the kernel system parameters.

As you are running 1000 instances, you should consider optimising the code within the process as much as possible.

You could probably use 1 awk script in the place of your 2 x sed + cut + tr subscript. This would reduce the number of running processes working on the task from around 4000 to 1000.

Edit: perhaps something like this:

awk -v F=$l '{ printf $F"," } END { print ""} ' infile > a_$l.t

Thanks for the information vbe.
I'm using a computer with a quad core processor, 12 GB of RAM and MAC OSX of OS.
The real file has 4 GB and 40,000,000 of columns.

Is it enough to help me?

Thanks for the attention.

Your task is pretty easy to split up into smaller sub tasks so you should try running 10 background jobs and see how it utilises your resources. If your system is still pretty idle increase to 20.

This is much better than running 1000's of tasks when you only have 16 cores as system control overheads (swapping processes in and out, etc) will take most of the resources and not leave much for the actual work.

doit() {
l=$1
while [ $l -le $2 ]
do
    awk -v F=$l '{ printf $F"," } END { print ""} ' infile > a_$l.t
    let l=l+1
done
}
 
doit 1 100 &
doit 101 200 &
doit 201 300 &
doit 301 400 &
doit 401 500 &
doit 501 600 &
doit 601 700 &
doit 701 800 &
doit 801 900 &
doit 901 1000 &

Its not a MacOSX server OS isn't it ?
What version?

I know there is a conf file /etc/sysctl* in macosx 10.3 server but I am completely novice... and trying to find on my powerbook... You will have to use sysctl command

While I am at it I will explain my thoughts: Not being able to fork usually happen when you reach a system parameter limit, I think of two possibles: max number of process for whole system or max process/user, their names differ between OS, on HP-UX we have maxuprc and nproc, for mac you would have to search a bit...
If you wish, we could move this to more suitable forum (MacOSX...)

I found in the man pages of sysctl and tested for maximum number of processes allowed on system, I displayed what i had and modified:

$ sysctl kern.maxproc
kern.maxproc: 532
$ sysctl kern.maxproc        
kern.maxproc: 1000
$ 

You will need more than 1000 :rolleyes: