Simple advanced question

Neil_mw · January 10, 2002, 7:03am

This is a fairly basic advanced question!

Does anyone happen to know what level of page faults should be acceptable on a fully operational production system?

Useful (?) information:

Production system using Oracle database
Compaq Tru64 UNIX Server (not sure what model, but it's big!)
8 CPU's
8 Gig of RAM - 99% is used
27 Gig of swap space - 35% is used
1050 processes
Kernel takes up 350Mb

The average Page Fault level is around 25000 per minute - this was achieved by doing a vmstat 60

Questions are :

Is this too high?
What should be an appropriate level of page faults?

Many thanks in advance (and apologies if the question has been asked a thousand times before - I did check the FAQ's)

Neil Weston

system · January 10, 2002, 9:15am

What version of the OS?

system · January 10, 2002, 9:38am

You will have to understand even with knowing what version of OS you are running, it would be impossible to give a good answer to your question. If a system is paging it may be just fine. Normally if a system is swapping, you have problems. But sometimes paging can also cause problems. You really need to get a history of how the system has acted in the pass, how many users are on it (including the ones you can not see ie. Oracle database clients).

To get your version -
Here are three commands that will display the version of UNIX that is running.

#dia -R -i osf_entry=300 |grep "Digital UNIX V"
#uerf -R -r 300 |grep "Digital UNIX V"
#sizer -v

And then go to the following links, especially the second (if it matches your version it would be more helpful but it will give you the different things you need to look at to find out if you really have a problem or not. )
Try info found in http://www.tru64unix.compaq.com/docs/
and http://www.tru64unix.compaq.com/docs/best\_practices/VLM_BP/TITLE.HTM

Hope this was helpful.

Cameron · January 10, 2002, 10:30am

Another consideration is "What else is running?".

Do you also have Tuxedo running on the same box?
Many variables to consider.
But you should be able to get more detail from your Compaq rep.

BOL

Neil_mw · January 10, 2002, 10:45am

Thanks for you replies!

I've had a look at all the TRU64 documentation, but as the system is a greenfield site, I've got no past stats to go back to for comparisons.

Unfortunately the documentation only goes into tuning, so far I can't find anything about expected page fault levels, and implications of high levels. Still looking!

Our application is the only one that is running and it's thick server; thin client. There's not a lot of memory usage that we can reduce at present - although it's being looked at.

Thanks once again

Neil

Neo · January 10, 2002, 10:51am

It would be helpful if you could describe your swap configuration..... is your system thrashing due to swap problem? How large is your swap space? How much of the swap space is being used?

Neil_mw · January 10, 2002, 11:01am

Thanks for you reply, Neo.

The problem is that we have individual users that are sometimes getting response times that are many times slower than usual. This is usually limited to single users rather than for everyone at once. One thing I've experienced in the past is that when NT systems are paging heavily and performance suffers, the numbers of page faults is high. Unfortunately I've got no way of determining what high is for this Compaq server, and I'm trying to prove / disprove that memory or paging is the cause of the problems.

The page file size is 27 Gig located across a number of spindles. There is 35% of this being used. I did mention this in my first post, so if you meant something else, please let me know and I'll get back to you as soon as I can.

Neil

Cameron · January 10, 2002, 11:16am

Neil,

Does your system have System Activity Reporting (SAR) in place?
That may provide some indicators to usage and when.

Humble suggestion

Perderabo · January 10, 2002, 11:17am

"Page faults" is a number that includes too many things to be very useful. I'm surprised that your vmstat bothers to display the number, although I see here that it does. Look instead at the page out rate (label=pout). If this number is high, that's your problem. Note that it is zero on the man page. Any non-zero number is not real good. And the higher the number, the worse it is.

Most versions of unix these days allocate just enough stack to store the environment and the arguments and then let the process page fault its way into memory. Does NT load the entire process into memory at start-up time?

Neo · January 10, 2002, 3:30pm

We still did not address the swap issue....... but now it seems we are talking NT..... I have no idea if NT uses swap space.... .does it?

Perderabo · January 10, 2002, 4:02pm

Actually, in this post and in again in this post he said that his 35 GB of swap is about 35% used.

Also, in this post he says:

Since he is trying to leverage his NT experience to understand the behavior of his unix system, the difference between the two is important. I think that his NT experience is misleading him into thinking that that high page faults are problematic.

Neo · January 10, 2002, 4:12pm

Got it... thanks. I was confused reading UNIX in one post and NT in others....not really paying close attention .... sorry for the confusion.....

Perhaps more RAM would help.........

Neil_mw · January 11, 2002, 5:22am

Thanks for your replies everyone - much appreciated.

Cameron - unforunately, Tru64 only has the old subset of system tools - vmstat, iostat, etc. Unfortunately SAR isn't one of them. I've used SAR in the past and really don't like not having it!

Neo - yes, I'd love to order more RAM, but before we do so, it unfortunately needs to be justified! The only way that I can do that is to show that the statistics that we are getting are over a particular threshold.

Perderabo - thanks for the information about pages out (pout). Yes we do have entries in this column (around 5000 per minute using vmstat), however the same question still applies - is this too high and what should it be?!!

I do appreciate that if everything was in RAM then so much the better, but surely this is not essential. For instance, if we have, say, 1000 processes and around 250 of them are idle. If those 250 were in swap it wouldn't matter - paging would occur but it shouldn't be a problem. Obviously if only 600 processes could fit in RAM and the other 150 active transactions were also in swap, it may well be the cause of problems.

Hope this may be of assistance!!

Kind regards,

Neil

Perderabo · January 11, 2002, 9:54am

Actually sar is older than vmstat, iostat, etc.

5000 page outs per minute is 83 page out per seconds. Yes that seems high, especially if it's sustained. I would strive for zero. Unix does not load entire processes into core. By the time stuff is in core, it's there because it needs to be. No it's not essential to add more core, you can live with the slowness instead. It's not like anything is breaking. But you were looking for the source of the slowness and I think you have found it.

Talk to your memory salesman and explain the situation. He or she may be able to allow you to borrow a gig of memory for a week or so. I'll bet that you will see the slowness disappear. Then your management can decide whether or not to buy the memory.

Neo · January 11, 2002, 5:09pm

Sounds like you need (at minimum) an additional of 4 Gigs of RAM and 8 Gigs would be better (if you plan to keep the same processes running on the platform!!)

I like Perderabo's suggestion of getting your vendor to let you use the memory and record the performance results. Present this to management...... if you find that it helps (it should).

Your next choice is to move processes (TBD) to another platform... and this might be much more expensive and have less desirable performance (can't say much since we don't know all the details of what's running.... just Oracle? different applications? same application? etc.??)

RAM ...... or distribute to another platform!!!!