FAULTY DISK replacement HP rx4640

Hello,

I'm new to this forum and as you will see from my question I'm new to UNIX as well.
One of our costumers has HP rx4640 running on UNIX with two 300GB hot-swappable disks that are mirrored. They reported to us that one of the disks is faulty and they want us to take care of it. Below is the only log they sent to us.

Fri May 18 17:50:11 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol1 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:50:12 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol3 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:50:12 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol4 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:50:12 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol5 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:50:12 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol6 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:50:12 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol7 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:50:12 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol8 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:50:12 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/SwapVol2 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol1 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol3 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol4 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol5 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol6 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol7 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/lvol8 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
Fri May 18 17:56:06 2012    STCHK 122 sd_procchk sd_procchk 1 Logical volume 
    /dev/vg00/SwapVol2 is mirrored but has some stale blocks. Data loss on 
    hardware failure could occur. 
# pvdisplay -v /dev/disk/disk13_p2 | grep stale
   00000 stale    /dev/vg00/lvol1         00000 
   00089 stale    /dev/vg00/lvol3         00000 
   00090 stale    /dev/vg00/lvol3         00001 
   00094 stale    /dev/vg00/lvol3         00005 
   00096 stale    /dev/vg00/lvol3         00007 
   00121 stale    /dev/vg00/lvol4         00000 
   00122 stale    /dev/vg00/lvol5         00000 
   00171 stale    /dev/vg00/lvol5         00049 
   00176 stale    /dev/vg00/lvol5         00054 
   00177 stale    /dev/vg00/lvol5         00055 
   00183 stale    /dev/vg00/lvol5         00061 
   00184 stale    /dev/vg00/lvol5         00062 
   00186 stale    /dev/vg00/lvol5         00064 
   00215 stale    /dev/vg00/lvol5         00093 
   00219 stale    /dev/vg00/lvol5         00097 
   00221 stale    /dev/vg00/lvol5         00099 
   00237 stale    /dev/vg00/lvol5         00115 
   00242 stale    /dev/vg00/lvol5         00120 
   00279 stale    /dev/vg00/lvol6         00000 
   00296 stale    /dev/vg00/lvol7         00000 
   00298 stale    /dev/vg00/lvol7         00002 
   00299 stale    /dev/vg00/lvol7         00003 
   00306 stale    /dev/vg00/lvol7         00010 
   00309 stale    /dev/vg00/lvol7         00013 
   00314 stale    /dev/vg00/lvol7         00018 
   00318 stale    /dev/vg00/lvol7         00022 
   00326 stale    /dev/vg00/lvol7         00030 
   00327 stale    /dev/vg00/lvol7         00031 
   00337 stale    /dev/vg00/lvol7         00041 
   00338 stale    /dev/vg00/lvol7         00042 
   00340 stale    /dev/vg00/lvol7         00044 
   00344 stale    /dev/vg00/lvol7         00048 
   00415 stale    /dev/vg00/lvol8         00000 
   00416 stale    /dev/vg00/lvol8         00001 
   00417 stale    /dev/vg00/lvol8         00002 
   00422 stale    /dev/vg00/lvol8         00007 
   00429 stale    /dev/vg00/lvol8         00014 
   00434 stale    /dev/vg00/lvol8         00019 
   00437 stale    /dev/vg00/lvol8         00022 
   00438 stale    /dev/vg00/lvol8         00023 
   00439 stale    /dev/vg00/lvol8         00024 
   00441 stale    /dev/vg00/lvol8         00026 
   00445 stale    /dev/vg00/lvol8         00030 
   00446 stale    /dev/vg00/lvol8         00031 
   00447 stale    /dev/vg00/lvol8         00032 
   00448 stale    /dev/vg00/lvol8         00033 
   00449 stale    /dev/vg00/lvol8         00034 
   00459 stale    /dev/vg00/lvol8         00044 
   00460 stale    /dev/vg00/lvol8         00045 
   00461 stale    /dev/vg00/lvol8         00046 
   00462 stale    /dev/vg00/lvol8         00047 
   00497 stale    /dev/vg00/SwapVol2      00000 

With my limited knowledge of UNIX i assumed from this that the disk ID is 13. If yes how do i find which of the two physical disk should be replaced?
And if i identify the problematic disk, are the below steps correct?

[i]1) Check that the disk is not in the root volume group with lvlnboot -v command
2) continue with the disk replacement:

# pvchange -a N /dev/dsk/- 
# <replace the hot-swappable disk> 
# vgcfgrestore �n vg01 /dev/rdsk/-
# vgchange �a y vg01 

If I'm way off please inform me as i got all this from "When Good Disks Go Bad" and as i mentioned I have very little experience with UNIX.

Any help is appreciated.
Thanks Gjk

I would not touch this if I were you. Ask an experienced HP-UX admin in your company to have a look...

That was my first thought, to not touch it. But at the moment our UNIX admin is not available and costumer is expecting a solution tonight. If there is a straight-forward procedure, for at least how to identify which physical disk is, please help.
What i got so far is this command:

 # ioscan �m lun /dev/disk/disk13

Thanks again,
Gjk

I dont know if I can help because I have VERY little experience on RX boxes now I managed to find one for home someone removed the disks before I got hold of it...
You did not say (or Im blind...) your OS version!
vg00 is usually the OS (and so root / boot disks...) even more true when you see no lvol2... for its the swap...
I would suggest before going further to get 2 new disks, get them ordered you may not need them or perhaps one but if things go wrong or the situation is not cool you may be glad to have 2 with you (believe my experience...).
Next once you have them try to make a recovery tape or an "Golden Image" somehow, if the system manages then you are perhaps in the way to solve your issue online, if not try to find the last bootable backup of the system (you may need it...)
Lest say you managed a make_recovery, you know can go and try stm or xstm (X GUI ) and see what the tool diagnose about your disks

here give us what you found, and we will do a bit of brainstorming...

Good luck!

Thank you for your help VBE and you are not blind but I don't know the OS version. As i have very limited access to the costumer site only tonight i will be able to find out the OS version and more about the disk. What i have confirmed is that it is a boot disk. With what i know from costumer the system is connected to msl6030 tape library and they do backups of configuration every day. Unfortunately I'm not familiar with diagnose tool but I'll try my best to get as much info as possible and post.
Thank you very very much

Type

 vgdisplay -v vg00 | grep dsk

to see what disks are in vg00
then again

vgdisplay -v vg00

And look at the last stanza : You should see something like:

   --- Physical volumes ---
   PV Name                     /dev/dsk/c2t2d0
   PV Status                   available                
   Total PE                    4340    
   Free PE                     357     
   Autoswitch                  On        
   Proactive Polling           On               

   PV Name                     /dev/dsk/c1t2d0
   PV Status                   available                
   Total PE                    4340    
   Free PE                     357     
   Autoswitch                  On        
   Proactive Polling           On               

Hoping it will tell you what disk is failing... or already dead e.g:

# ioscan -funC disk

Class I H/W Path Driver S/W State H/W Type Description

===================================================================

disk 0 16/5.2.0 sdisk CLAIMED DEVICE TOSHIBA CD-ROM XM-5401TA

/dev/cdrom /dev/dsk/c1t2d0 /dev/rdsk/c1t2d0

disk 1 16/5.5.0 sdisk CLAIMED DEVICE SEAGATE ST39173N

/dev/dsk/c1t5d0 /dev/rdsk/c1t5d0

disk 2 16/5.6.0 sdisk NO_HW DEVICE SEAGATE ST39173N =>  No Hardware...

/dev/dsk/c1t6d0 /dev/rdsk/c1t6d0

Also, look what you have in your /var/adm/syslog/syslog.log ! You may fins EM- critical messages...

Hello,

I was able last night to gather some more info about the OS and the condition of the disk. I think it needs to be replaced. As you will see from below is the alternate disk and not primary. I'm trying to scramble a procedure for replacement from "When Good Disks Go Bad" but i have some doubts about it. If you can help it will be real helpfully.

# uname -a
HP-UX - B.11.31 U ia64 2801820572 unlimited-user license

# model
ia64 hp server rx4640

# ioscan -funC disk
Class     I  H/W Path       Driver S/W State   H/W Type     Description
=======================================================================
disk      5  0/0/3/0.0.0.0  sdisk   CLAIMED     DEVICE       TEAC    DV-28E-N
                           /dev/dsk/c0t0d0   /dev/rdsk/c0t0d0
disk      0  0/1/1/0.1.0    sdisk   CLAIMED     DEVICE       HP 146 GST3146855LC
                           /dev/dsk/c2t1d0     /dev/dsk/c2t1d0s2   /dev/rdsk/c2t1d0    /dev/rdsk/c2t1d0s2
                           /dev/dsk/c2t1d0s1   /dev/dsk/c2t1d0s3   /dev/rdsk/c2t1d0s1  /dev/rdsk/c2t1d0s3
disk      4  0/1/1/1.0.0    sdisk   NO_HW       DEVICE       HP 146 GST3146855LC
                           /dev/dsk/c3t0d0     /dev/dsk/c3t0d0s2   /dev/rdsk/c3t0d0    /dev/rdsk/c3t0d0s2
                           /dev/dsk/c3t0d0s1   /dev/dsk/c3t0d0s3   /dev/rdsk/c3t0d0s1  /dev/rdsk/c3t0d0s3

# lvlnboot -v
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
    /dev/disk/disk9_p2 -- Boot Disk
    /dev/disk/disk13_p2 
Boot: lvol1    on:     /dev/disk/disk9_p2
            /dev/disk/disk13_p2
Root: lvol3    on:     /dev/disk/disk9_p2
            /dev/disk/disk13_p2
Swap: lvol2    on:     /dev/disk/disk9_p2
            /dev/disk/disk13_p2
Dump: lvol2    on:     /dev/disk/disk9_p2, 0

# pvdisplay �v /dev/dsk/c3t0d0 | more

# lvdisplay �v /dev/vg00/lvol1 | grep �Mirror copies�
Mirror copies               1  
# lvdisplay �v /dev/vg00/lvol2 | grep �Mirror copies�
Mirror copies               1  
# lvdisplay �v /dev/vg00/lvol3 | grep �Mirror copies�
Mirror copies               1  
# lvdisplay �v /dev/vg00/lvol4 | grep �Mirror copies�
Mirror copies               1  
# lvdisplay �v /dev/vg00/lvol5 | grep �Mirror copies�
Mirror copies               1  
# lvdisplay �v /dev/vg00/lvol6 | grep �Mirror copies�
Mirror copies               1  
# lvdisplay �v /dev/vg00/lvol7 | grep �Mirror copies�
Mirror copies               1  
# lvdisplay �v /dev/vg00/lvol8 | grep �Mirror copies�
Mirror copies  
             1              
# vgdisplay -v /dev/vg00
--- Volume groups ---
VG Name                     /dev/vg00
VG Write Access             read/write     
VG Status                   available                 
Max LV                      255    
Cur LV                      9      
Open LV                     9      
Max PV                      16     
Cur PV                      2      
Act PV                      2      
Max PE per PV               4357         
VGDA                        4   
PE Size (Mbytes)            32              
Total PE                    8694    
Alloc PE                    1994    
Free PE                     6700    
Total PVG                   0        
Total Spare PVs             0              
Total Spare PVs in use      0                     
VG Version                  1.0       
VG Max Size                 2230784m   
VG Max Extents              69712         

   --- Logical volumes ---
   LV Name                     /dev/vg00/lvol1
   LV Status                   available/stale           
   LV Size (Mbytes)            1824            
   Current LE                  57        
   Allocated PE                114         
   Used PV                     2       

   LV Name                     /dev/vg00/lvol2
   LV Status                   available/syncd           
   LV Size (Mbytes)            1024            
   Current LE                  32        
   Allocated PE                64          
   Used PV                     2       

   LV Name                     /dev/vg00/lvol3
   LV Status                   available/stale           
   LV Size (Mbytes)            1024            
   Current LE                  32        
   Allocated PE                64          
   Used PV                     2       

   LV Name                     /dev/vg00/lvol4
   LV Status                   available/stale           
   LV Size (Mbytes)            32              
   Current LE                  1         
   Allocated PE                2           
   Used PV                     2       

   LV Name                     /dev/vg00/lvol5
   LV Status                   available/stale           
   LV Size (Mbytes)            5024            
   Current LE                  157       
   Allocated PE                314         
   Used PV                     2       

   LV Name                     /dev/vg00/lvol6
   LV Status                   available/stale           
   LV Size (Mbytes)            544             
   Current LE                  17        
   Allocated PE                34          
   Used PV                     2       

   LV Name                     /dev/vg00/lvol7
   LV Status                   available/stale           
   LV Size (Mbytes)            3808            
   Current LE                  119       
   Allocated PE                238         
   Used PV                     2       

   LV Name                     /dev/vg00/lvol8
   LV Status                   available/stale           
   LV Size (Mbytes)            2624            
   Current LE                  82        
   Allocated PE                164         
   Used PV                     2       

   LV Name                     /dev/vg00/SwapVol2
   LV Status                   available/stale           
   LV Size (Mbytes)            16000           
   Current LE                  500       
   Allocated PE                1000        
   Used PV                     2       


   --- Physical volumes ---
   PV Name                     /dev/disk/disk9_p2
   PV Status                   available                
   Total PE                    4347    
   Free PE                     3350    
   Autoswitch                  On        
   Proactive Polling           On               

   PV Name                     /dev/disk/disk13_p2
   PV Status                   unavailable              
   Total PE                    4347    
   Free PE                     3350    
   Autoswitch                  On        
   Proactive Polling           On               

# cat bootconf 
l  /dev/disk/disk9_p2
l  /dev/disk/disk13_p2

# setboot
Primary bootpath : 0/1/1/0.0x1.0x0 (/dev/rdisk/disk9)
HA Alternate bootpath : 0/1/1/1.0x0.0x0 (/dev/rdisk/disk13)
Alternate bootpath : 0/1/2/0 (LAN Interface)

Autoboot is ON (enabled)
Hyperthreading : OFF
               : OFF (next boot)


# ioscan �m lun /dev/disk/disk13
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health    Description
========================================================================
disk     13  64000/0xfa00/0x5   esdisk  NO_HW       DEVICE       disabled  HP 146 GST3146855LC       
             0/1/1/1.0x0.0x0
                      /dev/disk/disk13      /dev/disk/disk13_p2   /dev/rdisk/disk13     /dev/rdisk/disk13_p2
                      /dev/disk/disk13_p1   /dev/disk/disk13_p3   /dev/rdisk/disk13_p1  /dev/rdisk/disk13_p3

Then from the below i was able to identify it physicaly
 # dd if=/dev/dsk/c3t0d0 of=/dev/null bs=1024 
  /dev/dsk/c3t0d0: No such device or address
  dd: cannot open /dev/dsk/c3t0d0

# dd if=/dev/dsk/c2t1d0 of=/dev/null bs=1024
  2339888+0 records in
  2339888+0 records out
   

Any help will be great,

Thank you.

I dont know how to deal with disk names (glad ioscan looks as usual...). This talks to me:

disk 4 0/1/1/1.0.0 sdisk NO_HW DEVICE HP 146 GST3146855LC
/dev/dsk/c3t0d0 /dev/dsk/c3t0d0s2 /dev/rdsk/c3t0d0 /dev/rdsk/c3t0d0s2

The bad and gone disk...

# pvdisplay �v /dev/dsk/c3t0d0 | more

Why did you not compare with the good one to see the expected output (c2t1d0)...

Now more tricky: (this is on HP-UX 11.11...)

ant:/home/vbe $ echo boot_string/S | adb /stand/vmunix /dev/kmem
boot_string:
boot_string:    disk(0/0/1/1.2.0.0.0.0.0;0)/stand/vmunix

if it returns something like disk(0/1/1/0.1.0.0.0.0.0;0)/stand/vmunix
You are lucky... (well not doomed anyway...)

But there is nothing more to do now than get that a new disk for replacement...

Hi
I asked someone on site to execute the echo command and this is what was returned.
adb: warning: Unrecognized format character - 'S'
I don't know if this means that the command was not correct??
Anyway i got more than 2 disks HP 146 ST3146855LC in stock so what I'm missing is the procedure. I have been searching all this days and i have comed with two very similar ones. And on both of them I'm not very sure about partitioning and make boot. Any way I'm posting booth and please, please correct me if I'm wrong. I want to apologize as I don't know how to separate the tags.

PROCEDURE 1

1) Deactivate Physical Volumen before extraction:

#/root> pvchange -a N /dev/dsk/c3t0d0s2

Do disk replacement. Give 90-120 seconds after disk extract and after disk insert.

2) "Discover" new disk:

#/root> diskinfo /dev/rdsk/c3t0d0
#/root> ioscan -fnC disk | grep c3t0d0
#/root> insf -eC disk

3) Erase partition table and create a new one:

#/root> idisk -Rw /dev/rdsk/c3t0d0
#/root> cat /tmp/partition_file

3
EFI 500MB
HPUX 100%
HPSP 400MB
[/CODE]

#/root> idisk -wf /tmp/partition_file /dev/rdsk/c3t0d0
#/root> insf -eC disk
#/root> efi_fsinit -d /dev/rdsk/c3t0d0s1

4) Restore LVM data and reactivate PV:

#/root> vgcfgrestore -n vg00 /dev/rdsk/c3t0d0s2
#/root> pvchange -a y /dev/dsk/c3t0d0s2

5) Create boot data:

#/root> mkboot -e -l /dev/rdsk/c2t6d0

I'M NOT SURE ABOUT THE BELOW

#/root> mkboot -a "boot vmunix -lq" /dev/rdsk/c3t0d0
#/root> lvlnboot -v -R /dev/vg00
#/root> vgchange -a y vg00

Some tests to see the sync process:

#/root> for i in 1 2 3 4 5 6 7 ; do lvdisplay -v /dev/vg00/lvol${i} ; done | grep "LV Stat"

LV Status available/syncd
LV Status available/syncd
LV Status available/syncd
LV Status available/syncd
LV Status available/syncd
LV Status available/stale
LV Status available/stale[/CODE]

    PROCEDURE 2

1) Save the hardware path
Run the ioscan command and note the hardware path

# ioscan �m lun /dev/disk/disk13
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health    Description
========================================================================
disk     13  64000/0xfa00/0x5   esdisk  NO_HW       DEVICE       disabled  HP 146 GST3146855LC       
             0/1/1/1.0x0.0x0
                      /dev/disk/disk13      /dev/disk/disk13_p2   /dev/rdisk/disk13     /dev/rdisk/disk13_p2
                      /dev/disk/disk13_p1   /dev/disk/disk13_p3   /dev/rdisk/disk13_p1  /dev/rdisk/disk13_p3

Lun hardware path is 64000/0xfa00/0x5
Lunpath hardware path is 0/1/1/1.0x0.0x0

2) Halt LVM access to the disk

# pvchange -a N /dev/disk/disk13_p2

3) Replace the hot-swappable disk and wait 2 minutes

4) Notify the maas storage subsystem that the disk has been replaced

If system not rebooted run scsimgr before using disk as a replacement for the old disk. For example:

# scsimgr replace_wwid �D /dev/rdisk/disk13

5) Determine the new lun instance number for the replacement disk. For example

# ioscan �m lun 
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health    Description
========================================================================
disk     13  64000/0xfa00/0x5   esdisk  NO_HW       DEVICE       offline  HP MSA Vol      
             
                      /dev/disk/disk13         /dev/rdisk/disk13     
                      /dev/disk/disk13_p1      /dev/rdisk/disk13_p1  
              /dev/disk/disk13_p2      /dev/rdisk/disk13_p2
              /dev/disk/disk13_p3      /dev/rdisk/disk13_p3

disk 28 64000/0xfa00/0x1c esdisk Claimed DEVICE online HP MSA Vol
0/1/1/1.0x0.0x0
/dev/disk/disk28 /dev/rdisk/disk28

6) (HP Integrity servers only) Partition the replacement disk.

a. Partition the disk by using the idisk command and a partition description file

First cleare the previews partition configuration on disk

idisk -Rw /dev/rdsk/c3t0d0

Create a partition description file. For example:

# vi /tmp/pdf

In this example, the partition description file contains:
3
EFI 500MB
HPUX 100%
HPSP 400MB[/CODE]

Partition the disk using idisk and the partition description file created above:

idisk -f /tmp/pdf -w /dev/rdsk/c3t0d0

To verify enter:

# idisk /dev/rdsk/c3t0d0

b. Enter the insf command with -e option to create legeacy device files for partitions:

# insf -insf -eC disk

Use efi_fsinit to initialize the FAT filesystem on the EFI pertition:

# efi_fsinit -d /dev/rdsk/c3t0d0s1

7) 7. Assign the old instance number to the replacement disk. For example:

# io_redirect_dsf -d /dev/disk/disk13 -n /dev/disk/disk28

This assigns the old LUN instance number(13) to the replacement disk. In addition, this device
special files for the new disk are renamed to be consistent with the old LUN instance number.

The following ioscan �m lun output shows the result:

# ioscan �m lun /dev/disk/disk13

Class I Lun H/W Path Driver S/W State    H/W Type     Health    Description
========================================================================
disk 13   64000/0xfa00/0x1c esdisk CLAIMED DEVICE online HP MSA Vol 
0/1/1/1.0x0.0x0
                   /dev/disk/disk13         /dev/rdisk/disk13     
                      /dev/disk/disk13_p1      /dev/rdisk/disk13_p1  
              /dev/disk/disk13_p2      /dev/rdisk/disk13_p2
              /dev/disk/disk13_p3      /dev/rdisk/disk13_p3

8) Restore LVM configuration information to the new

# vgcfgrestore -n /dev/vg00 /dev/rdisk/disk13_p2

9) Restore LVM access to the disk.

# pvchange -a y /dev/dsk/disk13_p2

10) Initialize boot information on the disk.

# mkboot -e -l /dev/rdsk/c2t6d0

I'M NOT SURE ABOUT THE BELOW

#/root> mkboot -a "boot vmunix -lq" /dev/rdsk/c3t0d0

#/root> lvlnboot -v -R /dev/vg00

#/root> vgchange -a y vg00

Thanks again for the support,
Gjk

My two cents:
You have an Integrity box, I only know PA-RISC... (worked with since 1993...), only I had to change a couple of times bad disks, reading the updated When Good Disks Go Bad: Dealing with Disk Failures Under LVM I would stick from your point 10 to what is to do on page 51 step 5 and 6 and do only that!
I will be away for 5 days but other here surely will take over (methyl?)

point4:
You must

 scsimgr replace_wwid �D /dev/rdisk/disk13

because you have not rebooted OK?
so 5: you run ioscan -m lun

point 6:

insf -eC disk

I am not sure that after, you need to initialize (using the above options...)...
So check before with

efi_ls -d /dev/rdsk/c3t0d0s1  
1 Like

@vbe
methyl is listening, but busy with a work job and on UK time.
Would like to know the exact hardware specification for this system (HP don't sell "300GB hot-swappable disks") and the expected mirror configuration. Not prepared to guess.
Let's eliminate the obvious. Is /var/adm/syslog.log full of SCSI LBOLT errors and other disc/controller disaster signs? Has someone visited the server to make sure that the power supply is intact and that the SCSI cable has not become displaced? What lights are lit on the disc drives (both of them)? Red, Green, flashing Green, none?

At the end of this, please consider fitting a third disc drive with view to triple mirroring.

The disks are
HP 146 GB 2.5" Hot Swap Hard Drive 10000RPM
Part #: 431958-B21

The second disk in the mirror is gone...
Im also very busy with 2 audits...

Assuming that the disc cannot be recovered by restoring power or plugging a cable in.

First impression is that all you need is to quiesce any applications (i.e. stop the lot) then:

1) Swap the dead disc for a new blank HP-supplied disc which will not have any previous LVM information on it whatsover.

2) Run ioscan -fn and check that the disc has changed from NO_HW to CLAIMED .
If it doesn't recover then you need a hardware engineer to look at the computer in depth.

3) Check the size of the disc with diskinfo and compare with the output of the same command for the good disc (very important, must be equal to or fractionally greater than the size of the good disc). If the disc is small, get another one and do not proceed.

4) Run the vgcfgrestore to the new disc.
Then, make volume group active again:
vgchange -a y vg00
Wait for 5 mins.

5) If you don't find that vgsync is already running (check ps -ef ) then issue vgsync vg00 .

6) Keep an eye on /var/adm/syslog/syslog.log for progress (it only makes an entry when a sync has completed). This could take several hours.

7) Check periodically on the sync status of all partitions. (Others have posted the commands).

8) When all the partition syncs are complete, relax and start the applications.

1 Like

VBE and METHYL thank you both for your support so far.

@ methyl
after what you posted i'm not sure if i need to partition the new disk or run

vgcfgrestore 

right after the new disk is recognized?

Thanks again

According to "when good disks go bad" for an Iantium boot disk, the partitioning comes first.

Sorry to mislead you. Myself and vbe don't have 11.23 on Iantium and the new procedure is different from every earlier version.

Hello,

We tried to do the disk replacement today but with no success. We got stacked on the server not recognizing the new disk as CLAIMED. Below the procedure followed:

Step 2)

# pvchange -a N /dev/disk/disk13_p2

Warning: Detaching a physical volume reduces the availability of data within the logical volumes residing on that disk.
Prior to detaching a physical volume or the last available path to it, verify that there are alternate copies of the data available on other disks in the volume group.
If necessary, use pvchange(1M) to reverse this operation.
Physical volume "/dev/disk/disk13_p2" has been successfully changed.

Step 3)
Replace the hot-swappable disk and wait 2 minutes

Step 4)

# scsimgr replace_wwid -D /dev/rdisk/disk13

scsimgr:WARNING: Performing replace_wwid on the resource may have some impact on system operation.
Do you really want to replace? (y/[n])? y
scsimgr: Successfully validated binding of LUN paths with new LUN.

Step 5)

# ioscan -m lun

Class I Lun H/W Path Driver S/W State H/W Type Health Description

disk      9  64000/0xfa00/0x0    esdisk  CLAIMED     DEVICE       online    HP 146 GST3146855LC       
           0/1/1/0.0x1.0x0
                    /dev/disk/disk9      /dev/disk/disk9_p2   /dev/rdisk/disk9     /dev/rdisk/disk9_p2
                    /dev/disk/disk9_p1   /dev/disk/disk9_p3   /dev/rdisk/disk9_p1  /dev/rdisk/disk9_p3

disk 13 64000/0xfa00/0x5 esdisk NO_HW DEVICE disabled HP 146 GST3146855LC
0/1/1/1.0x0.0x0
/dev/disk/disk13 /dev/disk/disk13_p2 /dev/rdisk/disk13 /dev/rdisk/disk13_p2
/dev/disk/disk13_p1 /dev/disk/disk13_p3 /dev/rdisk/disk13_p1 /dev/rdisk/disk13_p3

We tried two new disks but the result was always NO_HW. Few things that we noticed were that the STATUS LED on the new disks was very week comparing to the healthy in service one. The ACTIVITY LED went on only for a couple of sec and then was off all the time.
At the end we tried putting back the initial faulty disk. In this case both LEDS were very strong and on all the time but still NO_HW.

Any suggestions will be very helpful.

Thanks again for your support,
gjk

You new disks are not in great shape...
Can you insert them elsewhere and see if you can initialize them (e.g. by creating a new volume group and lvm on them...) just to check they can be used or are really almost dead...
Or try them on another server and see what happens...
Do you have separate controllers in your box? ( I usually ordered boxes with 2 independant SCSI controllers for the internal mirrorirng when possible, some boxes like RP 54000 series and above it was standard...)