memory errors

Solaris 8 4 GB RAM 420R

          Jan 11 12:33:32 serverXYZ SUNW,UltraSPARC-II: [ID 210962 kern.info] [AFT0] errID 0x0009882c.47bb8aca Corrected Memory Error on U1301 is Persistent

Jul 1 01:22:28 serverXYZ SUNW,UltraSPARC-II: [ID 274110 kern.info] [AFT0] errID 0x001b3157.ac53990e Corrected Memory Error on U1301 is Intermittent

Received memory error in jan and now in july. It reports that it had corrected the error(first persistent , now intermittent).

What should be done in this case - should the memory in this bank be replaced, is it fine or wait untill it happens more frequently before replacing?

Hi,

I would say it depends a bit of the functionality or role of the server. As long as your system is not a single point of failure, I would wait a bit, sometimes memory has such kind of head ache.

But you know, it's really difficult to give you a good recommendation about this ...

Best Regards
Malcom

Do not change anything with that system. If the errors are that uncommon it is unnecessary.

Sun's best practices say you should replace failed memory for an intermittent error only if you get at least 3 of them within 24 hours. 2 in 6 months like this is no big deal. If you start getting uncorrectable errors or they panic the box then you do it right away, but for intermittent ones I'd wait.

Also, it helps to make sure you are at current levels of the kernel patch. Sun frequently improves the memory handling and scrubber routines to help the system handle memory problems better.

Great ...thanks for the suggestions....a big help. I will continue to monitor/update.

Thanks again.