I'll try to explain the percentages to the best of my knowledge:
idle: should be obvious
user: the time the processes run in user space. That is loops and all in-memory operations
kernel: the time spent in kernel space. Everything I/O runs in kernel space, that is: reading & writing files, network communication, user I/O (when not sleeping), loading/forking new processes, ...
iowait: the time waiting for an I/O device to come ready. As long as this is 0, none of your processes had to wait for a disk to come ready, or was blocked by a slow net connection.
swap: if this is zero it basically means none of your processes had to be swapped out, so your current memory installation is enough.
From the definition of the "load". If nothing is running, the load is 0. Each process running or runnable adds 1 to the load, which then gets averaged over 1/5/15 minute(s). So a 1-minute load of 23.3 means that during the last minute there were on average 23.3 processes running or ready to run. Spread over 48 cores that's about half of them used, with the other half more or less idling away.
My server has to keep on processing the files as and when it arrives to the server. And the queue of pending shouldn't be long.
That is i have to finish processing them as soon as possible than processing many files at a time.
To achieve this which is preffered, loading until i reach load avg 48 or keeping it less than 48, so multiple cpus can perform a task to complete it easy.
I have tried keeping it higher than 48, it does parallel processing but it takes lot of to complete a single task. that is not i wanted.
What do you mean by "load avg"? And when you quote specific numbers (somehow related to the number of CPU's in your server) what do you mean?
Are you expecting that your program (however number of processes it may spawn) will get distributed to different CPU's for processing without having to make specific compiler modifications to do so? If so, you are mistaken my friend.....
No, not nothing at all. Just not very much. By my definition, and please supply a better one if you have it, the base load is 0. Only running or runnable processes add to that, but not processes that are sleeping, eg because they're waiting for a timer to run out, or for user input.
Now, in your example you're showing a pretty idle machine (httpd with a total of 17 CPU seconds, while the machine is up for more than 2 days), with most processes waiting for some kind of external input or waking up occasionally (eg cron). That means the load counter isn't getting increased very much at each data point. And if the load is then averaged, the average seems to fall below 0.005 (because at that point it'd be rounded up to 0.01).
True. To tell you the truth, I have not been able to make any sense of Sun's definition for either pre or post Solaris 10's load averages. In general, from experience, I have found that machines that get slow once the load averages start approaching digits (even 3-4 for me has shown noticeably slow performance) - now I have never had the experience of a 48 core machine, but have plenty of experience with Netra series quad core servers.
I am not sure what you are asking me. But here are some pointers:
All your data is stored in the memory bank of the CPU that is running your program. Your program and all its spawned processes will be executed on a single CPU unless you compile it very specifically for multi-processor CPU's.
Generally the memory allocated for each CPU is by banks on the Sun boxes, and a particular CPU has a specific set of DIMM's dedicated to it. As long as your data structures (combined total) is not greater than the size of your memory bank (not counting the overhead space) you dont need to worry about it. In fact, when this space is exceeded, it will swap - so to the user this should be transparent. So you could potentially spwawn a thread for every single process you have.