urgent problems with ESS SAN, SDD upgrade on AIX 5.3 server

Hi all,

Sorry if this is in the wrong place but needed to make sure lots of people saw this so that hopefully someone will be able to help.

Basically i've upgraded a test server from 4.3 to 5.3 TL04.

The server has hdisk0 and 1 as rootvg locally but then has another vg setup on our ESS 2105 SAN.

Once i had done this i've then set about upgrading the SDD software on the AIX box since it had devices.sdd.43.rte installed and needed devices.sdd.53.rte.

will doing this i have then also upgraded the ibm2105.rte host attachment script software from 32.6.100.18 up to 32.6.100.25
also: devices.fcp.disk.ibm2105.rte from 32.6.100.18 to 32.6.100.29
and devices.scsi.disk.ibm2105.rte from 32.6.100.18 to 32.6.100.29

i followed the pdf's supplied by IBM regarding the upgrade process - unmounting filesystems on SAN, varyoff vg, rmdev on the vpaths and hdisks, running the upgrades, then cfgmgr the devices back.

However while i can get the vg online and mount the filesystems, view files etc etc the actual setup of the vpaths seems to have changed.

here is the lsvpcfg from before the upgrade of the SDD software:

vpath0 (Avail ) 10D23296 = hdisk5 (Avail pv ) hdisk68 (Avail pv )
vpath1 (Avail ) 20D23296 = hdisk6 (Avail pv ) hdisk72 (Avail pv )
vpath2 (Avail ) 30D23296 = hdisk7 (Avail pv ) hdisk76 (Avail pv )
vpath3 (Avail pv crm_train_vg) 00623296 = hdisk63 (Avail ) hdisk88 (Avail )
vpath4 (Avail pv crm_train_vg) 00723296 = hdisk64 (Avail ) hdisk89 (Avail )
vpath5 (Avail pv crm_train_vg) 00823296 = hdisk65 (Avail ) hdisk90 (Avail )
vpath6 (Avail pv crm_train_vg) 10723296 = hdisk66 (Avail ) hdisk91 (Avail )
vpath7 (Avail pv crm_train_vg) 10823296 = hdisk67 (Avail ) hdisk92 (Avail )
vpath8 (Avail pv crm_train_vg) 20723296 = hdisk69 (Avail ) hdisk93 (Avail )
vpath9 (Avail pv crm_train_vg) 20823296 = hdisk70 (Avail ) hdisk94 (Avail )
vpath10 (Avail pv crm_train_vg) 20923296 = hdisk71 (Avail ) hdisk95 (Avail )
vpath11 (Avail pv crm_train_vg) 30723296 = hdisk73 (Avail ) hdisk96 (Avail )
vpath12 (Avail pv crm_train_vg) 30823296 = hdisk74 (Avail ) hdisk97 (Avail )
vpath13 (Avail pv crm_train_vg) 30923296 = hdisk75 (Avail ) hdisk98 (Avail )

and here is the output afterwards

vpath0 (Def ) 00623296 = hdisk2 (Avail pv crm_train_vg) hdisk16 (Avail pv crm_tr
ain_vg)
vpath1 (Def ) 00723296 = hdisk3 (Avail pv crm_train_vg) hdisk17 (Avail pv crm_tr
ain_vg)
vpath2 (Def ) 00823296 = hdisk4 (Avail pv crm_train_vg) hdisk18 (Avail pv crm_tr
ain_vg)
vpath3 (Def ) 10723296 = hdisk5 (Avail pv crm_train_vg) hdisk19 (Avail pv crm_tr
ain_vg)
vpath4 (Def ) 10823296 = hdisk6 (Avail pv crm_train_vg) hdisk20 (Avail pv crm_tr
ain_vg)
vpath5 (Def ) 10D23296 = hdisk7 (Avail pv ) hdisk21 (Avail pv )
vpath6 (Def ) 20723296 = hdisk8 (Avail pv crm_train_vg) hdisk22 (Avail pv crm_tr
ain_vg)
vpath7 (Def ) 20823296 = hdisk9 (Avail pv crm_train_vg) hdisk23 (Avail pv crm_tr
ain_vg)
vpath8 (Def ) 20923296 = hdisk10 (Avail pv crm_train_vg) hdisk24 (Avail pv crm_t
rain_vg)
vpath9 (Def ) 20D23296 = hdisk11 (Avail pv ) hdisk25 (Avail pv )
vpath10 (Def ) 30723296 = hdisk12 (Avail pv crm_train_vg) hdisk26 (Avail pv crm_
train_vg)
vpath11 (Def ) 30823296 = hdisk13 (Avail pv crm_train_vg) hdisk27 (Avail pv crm_
train_vg)
vpath12 (Def ) 30923296 = hdisk14 (Avail pv crm_train_vg) hdisk28 (Avail pv crm_
train_vg)
vpath13 (Def ) 30D23296 = hdisk15 (Avail pv ) hdisk29 (Avail pv )

I cannot seem to move the vpaths from DEF to AVAIL. while everything still seems to work its worrying me a bit. Also all the hdisk numbers have changed although im still viewing the same data when i go into the vg.

I've tried rebooting, redoing the SDD config by deleting the vpaths/hdisks and going again. I've also tried removing and add data paths from the device menu in smit.

Finally the last wierd problem is that within smit > devices > data path devices

if i try to select "display data path device adapter status" it shows "no device file found" thats really worrying me ! lol....

Anyway if anyone can help or offer some ideas i'd really welcome them.

It might be a test box but i need to be careful with the SAN and i want to know how to fix this.

Again apologises if this is in the wrong area.

Thanks in advance

Hi,

I have a question to you did you upgraded SDD version.

To check this from type "datapath query ver"

The current recommeded version is 1.6.2.0

You can downlaod this from ftp://ftp.software.ibm.com/storage/subsystem/aix/1.6.2.0/

you need to reboot.

I highly recommend you get a SAN Engineer that knows what they are doing. I f you have SAN support with IBM CALL THEM. they are smart enough not to take you beyond what they are capable.

I believe they let you have one support call at purchase that they will let you keep open for 30 days.

If don't have support pay the T&M to IBM. Even if it costs a couple thousand it may be worth it.

This is not much, but a SAN Engineer upgraded our ESS sdd and host attachment scripts over a year ago. The pool of disks were at that time viewable by all of 4 AIX systems, but the disks weren't carved on all 4 systems. At any rate, the vpath #s did change after our upgrade. From what I understood they where expected to change. The LUN IDs are what is important. Be careful.

thanks for the info.

i am considering calling IBM on this. While it is only a test box and the data isnt that important its the other data on the SAN that im worried about.

As for SDD yes it was upgraded -

it was still on devices.sdd.43.rte so i had to upgrade to devices.sdd.53.rte which puts it on version 1.6.2.0.

This is the problems im getting though - when i run the command you suggested - datapath query ver - it comes up again with device file not found.

As for the LUN ID's ,take a look at the data i posted - unless im mistaken the same LUN id's are still present ( these are the numbers after the vpathxx (dev) part arent they? ). Same LUN id's just in a slightly different order. So i was happy with that.

I'm starting to wonder that even though i have followed the IBM instructions that there is a conflict between the newer version of all the software i.e.

SDD
Host Attachment script
devices.fcp.disk.ibm2105.rte
devices.scsi.disk.ibm2105.rte

and something else on my server - also i cannot seem to find the FC adapters that i have on the IBM website

lscfg -vl fcs0 produces

fcs0 P1-I1/Q1 FC Adapter

    Part Number.................00P2995                                     
    EC Level....................A                                           
    Serial Number...............1D2360CD0C                                  
    Manufacturer................001D                                        
    Feature Code/Marketing ID...2765                                        
    FRU Number..................     00P2996                                
    Network Address.............10000000C92EC946                            
    ROS Level and ID............02C03891                                    
    Device Specific.\(Z0\)........2002606D                                    
    Device Specific.\(Z1\)........00000000                                    
    Device Specific.\(Z2\)........00000000                                    
    Device Specific.\(Z3\)........02000909                                    
    Device Specific.\(Z4\)........FF401050                                    
    Device Specific.\(Z5\)........02C03891                                    
    Device Specific.\(Z6\)........06433891                                    
    Device Specific.\(Z7\)........07433891                                    
    Device Specific.\(Z8\)........20000000C92EC946                            
    Device Specific.\(Z9\)........CS3.82A1                                    
    Device Specific.\(ZA\)........C1D3.82A1                                   
    Device Specific.\(ZB\)........C2D3.82A1                                   
    Device Specific.\(YL\)........P1-I1/Q1

But the feature code is unfound in IBM ?? wierd

Anyone ever had problems like this before - i am seriously considering reverting to the older versions of the software if possible

I know there are tools in those executeables to confirm that vpaths are ok but I do not remember how they are used. And there are tools to "clean it up"
Whatever you do you want to make sure you don't run a command that will damage what is on those vpaths.

Your vgs (ODM really) also have the pvid info in there. When you do the importvg it should pull all of that in . BUT I do not know if you are ready to do that step. You may still need to confirm that what sdd sees, vpath, LUN pvid info is all correct.

fixed it !!!! miracle

right after alot of messing around i also upgraded the microcode for the fc adapters.

still didnt fix problem.

had another look at the IBM website specifically http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S4000191&loc=en_US&cs=utf-8&lang=en

this has the devices.sdd.53.rte driver but version 1.6.0.0 not 1.6.2.0.

so unmount fs on san
deactivated vg on san
removed vpaths and then hdisks relevant to san
unistalled ver 1.6.2.0 and installed ver 1.6.0.0
cfgmgr -vl fcsX on both my adapters
then cfallvpath

redid the lsvpcfg and got:
vpath0 (Avail pv crm_train_vg) 00623296 = hdisk2 (Avail ) hdisk16 (Avail )
vpath1 (Avail pv crm_train_vg) 00723296 = hdisk3 (Avail ) hdisk17 (Avail )
vpath2 (Avail pv crm_train_vg) 00823296 = hdisk4 (Avail ) hdisk18 (Avail )
vpath3 (Avail pv crm_train_vg) 10723296 = hdisk5 (Avail ) hdisk19 (Avail )
vpath4 (Avail pv crm_train_vg) 10823296 = hdisk6 (Avail ) hdisk20 (Avail )
vpath5 (Avail pv ) 10D23296 = hdisk7 (Avail ) hdisk21 (Avail )
vpath6 (Avail pv crm_train_vg) 20723296 = hdisk8 (Avail ) hdisk22 (Avail )
vpath7 (Avail pv crm_train_vg) 20823296 = hdisk9 (Avail ) hdisk23 (Avail )
vpath8 (Avail pv crm_train_vg) 20923296 = hdisk10 (Avail ) hdisk24 (Avail )
vpath9 (Avail pv ) 20D23296 = hdisk11 (Avail ) hdisk25 (Avail )
vpath10 (Avail pv crm_train_vg) 30723296 = hdisk12 (Avail ) hdisk26 (Avail )
vpath11 (Avail pv crm_train_vg) 30823296 = hdisk13 (Avail ) hdisk27 (Avail )
vpath12 (Avail pv crm_train_vg) 30923296 = hdisk14 (Avail ) hdisk28 (Avail )
vpath13 (Avail pv ) 30D23296 = hdisk15 (Avail ) hdisk29 (Avail )

Now all the vpaths are Avail - LUN id's were always the same so was always looking at the correct data which i'd proved earlier by the fact that the VG was variedon and fs was working.

But now it all looks correct.

What i need to figure out is the difference between the two versions.

1.6.0.0 is supposed to be used if using the VIOS ( Warning: SDD version 1.6.1.0 or later is not supported by the VIOS at this time. )
So i can only assume that i am indeed using the Virtual I/O server although the info for the later drivers i.e 1.6.2.0 says that i should be able to use it with ESS.

Either way it is now working perfect.

thanks for the help anyway though people. Hopefully this will be useful to someone in the future.