I have a SUN M5000 server running several Solaris zones (whole root). In all the zones, I have SAP systems running. Recently, one of the SAP systems got stuck (hanged), I suppose was a memory issue. I was not able to log into the zone at all. In fact, I observed that I was not able to log onto the server (global) also. I started halting the zones one by one and then at some stage, I was able to log onto the global zone.
Is it possible due to one particular zone, the entire server gets hung? What can be done to avoid this?
What commands other than prstat -Z will help identify the issue/symptoms etc?
Of course, I'm also looking at SAP side in terms of memory fine-tuning so as to prevent this happening again.
Client-server? between what?
You should have reacted earlier if a zone created such a situation, because, after we can only guess few reasons
1) It can happen if badly designed...
2) I cant remember
But more what did you find in your logs? What caused the hang not the application, the system side? overload? etc...
If I were asked at a first glance a reason, if client-server box we lets say multiple (many hundreds...) concurrent access from PCs I would say look with netstat for *FiNWAIT and alterego stuff because it would think badly tuned you run out of sockets explaining you can open new connections...
I let others give you a better explanation than I can at the moment
You have to use zone resource management to prevent that problem. This is dummied-up output from prctl -i zone [zonename]
zone.max-swap
system 16.0EB max deny -
zone.max-locked-memory
system 16.0EB max deny -
zone.max-shm-memory
system 20.0GB max deny -
zone.max-shm-ids
system 1.8M max deny -
zone.max-sem-ids
system 16.8M max deny -
zone.max-msg-ids
system 16.8M max deny -
zone.max-lwps
system 8.4K max deny -
zone.cpu-cap
privileged 200 - deny -
system 4.29G inf deny -
zone.cpu-shares
privileged 1 - none -
system 65.5K max none
You can control these settings with zonecfg or dynamically with prctl
Examining the running system requires using iostat , prstat , fsstat , netstat -s , and echo '::memstat' | mdb -k # from global zone
to get a BASIC idea. Advanced probing usually requires dtrace.
Yes, it is possible for one zone to eat enough resources to grossly affect other zones and global.
The tools are there to cap the memory usage of this zone in the zone configuration (zonecfg) if its eating of physical memory is definitely the problem.
Of course, users of this zone may experience new limitations. If that's a problem consider increasing the overall RAM in the system (again assuming your prognosis is correct about the problem being memory).
---------- Post updated at 11:38 AM ---------- Previous update was at 11:36 AM ----------
Sorry - just realized jim_mcnamara has already said this (but I'll leave this post now anyway).