threads - concurrent processing

siva_jm · September 5, 2003, 6:45pm

i have a program which uses a java thread that concurrently copies table data with many DB connections (one for each thread)

new to unix, wanted to know if its possible to do a similar thing in a shell script.

for eg

script.sh

trigger 2 stored procs at the same time which are independent of each other

set a flag in both of them just to know at the end.

is it possible in a shell script to do this

AIX server suports concurrent processing

Perderabo · September 5, 2003, 11:40pm

You're really asking two questions here. I'm not certain that you're clear on the difference between a thread and a process. But I will assume that you are and proceed from there.

First, I do not know of a shell with thread support. So as far as I know, no you cannot do threads in a shell script.

But it quite easy to launch processes in a shell script. And if you have multiple cpu's concurrent processing is easy.

In fact, concurrent processing is easy from the command line. Consider a command like:

ls -lR | grep mp3

If you have two cpu's, it is quite possible that both processes will run at the same time. No big deal here. But I have rewritten slow shell scripts to take advantage of multiple cpu's and I have seen a boost in performance.

Besides pipelines, you can put programs in the background. These may find their way into another cpu.

siva_jm · September 6, 2003, 2:02pm

thank you, that helped.

by thread i meant a java thread, i beleive it creates a seperate process in the background for each thread., not sure.

what i want is

sample.sh

Start
|
|
Copy detailtableA from one database to another (database connection 1)
Copy detailtableB from one database to another (database connection 2)
....
Copy detailtableN from one database to another (database connection N)
(I wanna do this simultaneously since they are not dependent on each other)
|
|
Get some flag when everything is complete
|
|
Copy master table from one database to another
|
|
End

The data is very important and goes from a production box to an archive box, so can't afford to lose data here.

I was able to do this in a Java program using threads, spanning many threads for each database connection and acheive the work, its pretty good.

But i want to know if i can do this in a shell script and stored procedure for each copy (row by row commit) it would be better, the volume is huge.

Thank you
dhaya

Perderabo · September 6, 2003, 3:01pm

Yes, as long as we're talking about processes and not threads, this is very easy.

You can launch a background job by just putting an ampersand on the end of the command. And the shell has a wait command. Use them like this:

some_program_or_script &
another_program &
and_another &
wait

Those three programs should run simultaneously. The script will stall at the wait command until they are all done.

Immediately after you launch a background job, its pid is available in $! and you can store this pid:

some_job &
pid=$!

And you can wait for this particular job to finish with:
wait $pid

This all works pretty well. The downside is that it is very hard for these jobs to communicate with one another or even to communicate results back to the parent. Temporary files are the usual solution.

siva_jm · September 6, 2003, 4:05pm

thank you, thats what i need to know.

writing and reading form files to communicate between each other right..

just one more thing, there are around 100 tables, so i am assuming creating that many processes won't affect any runtime etc.

thank you

Perderabo · September 6, 2003, 7:17pm

100 processes is too much unless you have 50 cpu's or more. I would not create any more than 2 processes per cpu as a max. And 5 or 10 processes would be more reasonable.

And remember that when you do:
something &
unless "something is a c program that does not fork, it actually may be several processes itself. Your os may prohibit you from creating many processes. On HP-UX maxuprc is 50 by default.

When there are processes waiting on the run queue for a cpu, the scheduler will switch processes so that they get a shot at the cpu. That's called a context switch and it can be expensive. Each cpu typically has enough on-board storage to remember how to run a few processes. But more than that and stuff must be reloaded from memory.

Since you are after performance, you should benchmark it. Increase the number of processes until it is counter-productive and then stop.

siva_jm · September 9, 2003, 11:40am

thanks a lot , that helped me.

looking at threads now