VCS triggerring panic on 1 node, root disk under SVM

amlanroy · May 23, 2013, 4:29am

We have two node cluster with OS disk mirrored under SVM. There is slight disk problem on one of the mirror disk causing cluster to panic.
Failure of one mirror disk causing VCS to panic the node. Why VCS is not able to write /var filesystem, as one of the disk is healthy.

--------------------------------------------------------------------------------------------
From VCS engine log,

2013/04/20 02:34:18 VCS INFO V-16-1-50135 User root fired command: hagrp -unfreeze ORACLE_PRASAPDB_Group from localhost
2013/04/21 10:38:14 VCS INFO V-16-1-10196 Cluster logger started
2013/04/21 10:38:14 VCS NOTICE V-16-1-11022 VCS engine (had) started

From OS messages,

Apr 21 04:00:26 prdagwn1 genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1t0d0s1, offset 65536, content: kernel Apr 21 04:00:31 prdagwn1 scsi: [ID 365881 kern.info] /pci@400/pci@0/pci@8/scsi@0 (mpt0):
Apr 21 04:00:31 prdagwn1 Log info 31140000 received for target 1.
Apr 21 04:00:31 prdagwn1 scsi_status=0, ioc_status=8048, scsi_state=c
Apr 21 04:01:32 prdagwn1 md_stripe: [ID 641072 kern.warning] WARNING: md: d20: read error on /dev/dsk/c1t1d0s0 Apr 21 04:02:32 prdagwn1 last message repeated 1 time Apr 21 04:03:33 prdagwn1 md_stripe: [ID 641072 kern.warning] WARNING: md: d20: write error on /dev/dsk/c1t1d0s0 Apr 21 04:07:42 prdagwn1 genunix: [ID 409368 kern.notice] ^M100% done: 405019 pages dumped, compression ratio 3.71, Apr 21 04:07:42 prdagwn1 genunix: [ID 851671 kern.notice] dump succeeded

I can see the md has problem in write access on disk slice c1t1d0s0 which has /var included. If the directory /var is in question with access problem, VCS GAB would trigger system panic due to client process failure as VCS daemon 'had' can't access the /var directory.

Oracle suggested to replace disk1 after initial investigation. This is the same disk that failed 3 weeks back

Is there any setting we can do in VCS to accomodate delayed write in disk under SVM ? Or do we move from SVM to VxVM ?

macorval · May 24, 2013, 2:07am

Hi,

Could you print the output:

1) metadb
2) metastat -p

What procedure was done for you in the replace disk?