VIOS - fibre adapters not seeing luns

Hi guys,

I've been trying to tackle this issue for days and I'm stumped. Hopefully someone can give more light on what else I can do.

I have a p7 series box, with dual VIOS and 10 lpars and everything was working fine until I had to move the box to another location in the data centre. Ensured I labelled where each cable went etc.

Upon powering on the p7 server, it came up fine into standby mode. Then powered on the system. After that, I then powered on the VIOS (Primary) server and that came up fine and booted into the OS. (Each VIOS server has 2 x dual ports dedicated fibre adapters) I then proceeded to boot each lpar successfully. Now, when I ran lspath on VIOS (Primary) it showed that the second adapter has failed path. I tried to re-enable it to no success. I thought I'd continue with VIOS (Secondary) and continue along, to my suprise, no luns were accessible to the server. I even tried to go to SMS Menu to use the SAN Zone support, and Open firmware prompt for ioinfo and no luns were displaying. (I am using SCSI adapters for the luns)

The list below is what I have tried and unsuccessful.
1 - have changed over the fibre cables to the adapter and no success
2 - have bypassed the fibre patch panel and directly to the fibre switch and no success
3 - confirmed we can see light coming through the fibre cables
4 - Changed switched ports on the fibre adapters
5 - wrap plug test and no success
6 - Used the fibre cable that is plugged into the working fibre adapter and used in the fibre adapter that has issues and still no success.
7 - I have confirmed that I can see the fibre adapters as available on the VIOS server
8 - I have removed the fibre adapters from the system and re configured them
9 - I have shutdown the system completely, removed power and reattached power and started up the system and no success
10 - I have physically removed the fibre adapters and reseated them
11 - I have provided IBM with a 'snap' which is the system configuration of the VIOS server - so far hardware support can't find any fault.
12 - I have screenshot my VIOS configuration profile to IBM - no issues here
13 - I have ran an exe program from IBM to determine transmit and receive and noise on the port and the ones not working don't output any information.
14 - I have moved the fibre adapters from the VIOS server to another LPAR and still have the same issues.
15 - On the back on the server, I can see there is power to the adapters with green flash lights, but no link activity.
16 - I have checked with IBM documentation that Green LED slow flash and no Yellow LED indicates - Normal, link inactive or not started.

Any input would be appreciated. Thanks.

Hi,

here some hints:

  • Did you used the same switch ports in the data center?
  • (if not) Did you checked the speed, fillword, NPIV settings for the used switch ports?
  • See your storage admin your wwns as 'logged in'?
  • Maybe you musst reenable a logged out server adapter on storage level (i don't know which Storage you use)

PS
Can you please post the output from following commands:

lspath
fcstat <fcsX>        # for both adapters
lscfg -vl <hdiskX>   # for one of the disk with a failed path

Regards

-I don't believe we used the same switch ports
-We are using brocade silkworm...forgot model
-the SAN team advised their settings are the same as the adapter that is currently working
-they have added the wwns but they don't come up online
-what does the logout server adapter do? Beside the obvious because it's like I've shut the system down and turned it back on. Should be normal?
-I'm not at my computer at the moment, but
lspath displays just only enabled paths for the working adapter. The failed one disappeared when rmdev the devices.

  • fcstat only shows info about the working adapter and hangs on the other adapter with error which basically means, the device is not there or open or no activity.
    --lsdev I will show u the information later as I'm not at my computer and will edit the post with the relevant information you have requested.

Thanks

I wanted to know if your Storage-Admin can see your WWNs as logged in (online) but this answer depends on your used storage (EMC, NetApp,..?). In our old environment with Datacore Storage Virtualisation, we sometime need to (re-)enable on a logged out server adapter at storage Site.

Okay, i know that cfgmgr hangs on a adapter while searching for child devices and throw some errors if no devices available but fcstat should come back immediately. Maybe i am wrong.

Regards

Our storage admin can definitely not see the wwns online or logged in.
We are using 3PAR SAN

Fcstat definitely immediately outputs information on a working adapter.
But would it not show if there is no activity? I can't remember, or maybe it hangs.

Maybe I'll have to ask about the server adapter log out. Do you know anything about if 3PAR requires this?

As already hinted to by Xray, i suppose this is a zoning problem. Picture a "zone" in a SAN to be like a VLAN: some switch ports, adapters and LUN initiators (all identified via "WWN", which serves the same purpose as a MAC address does in a net) are allowed to communicate. As you moved your box it might well be that you use now different FC-switch ports and/or other parts of the fabric and depending on your setup the SAN admins may or may not have to change the zoning accordingly.

Your SAN admin should know this off the top of his head. Generally this is a problem you can only solve working hand in hand with your SAN admins. Except some general hints like the ones we already gave you there is pretty little we can do for you. Even if we would have the necessary knowledge too much depends on your setup, which we do not know in the necessary depth.

I hope this helps.

bakunin

Bakunin, the SAN team advised me they don't do port mapping. They do WWN mapping. So, whatever port they plug into shouldn't matter. (I'm not 100% sure of this). Think of it like this. Fabric switch A - has WWN mapping enabled. SAN adds my MAC address to Fabric switch A. Fibre cable from my FC adapter to Fabric switch A and WWN doesn't come online.
Does Xray advise still apply?

I am looking for more information in regards to 3PAR and 'server logged out adapter' as Xray mentioned.

Sorry, I have no experience with this storage system.

You said you have no link, but the link between adapter and switch should normally established in case of www-zoning, independent of your storage system.

Can you please repeat the "SAN Zone support" step and verify the link activity (if yellow light start flashing or not) on the failed adapter? In case of no activity, and under assumption you already twist and replace the cable, i guess the adapter is broken.

Yes, no link,connectivity,communication to the fabric switch. SAN guys cannot establish a "link" or "connectivity" from the adapter to the switch.
I will go out tomorrow and do "SAN Zone Support" and re-plug the fibre cable.
I just don't know how the adapter can be broken...it's possible, but improbable that 3 x dual port fibre adapters have 'failed'.
Like I mentioned, upon rmdev the devices and cfgmgr and the adapters coming back fine, don't show me an indication that the adapters could be faulty. I even ran a diag test and no issue. I have no evidence to say its a hardware fault. I just don't know. Last case scenario, try another PCI-x slot and get an IBM engineer to come out and replace the adapter. If the new adapter installs fine, and still same issue, it's definitely the SAN team issue. :mad::mad::mad:

Oh on another thought X-ray, what happens if on the switch the port comes up as FE - Fabric Extender? Sometimes the yellow light came on the adapter, but will not display the luns to the VIOS server (secondary server).

Adapters can be broken because of the power-cycling of the host system. In most cases if an adapter is defective it shows once you power-cycle a system. This is a common effect.

Only a few weeks ago we had a power failure in one of our datacenters (lightning struck and UPS failed come up). When we restarted the complete environment an hour later we had several adapters (and in one case even a sysplanar) failing completely or in parts.

In the end we replaced the sysplanar, two power supplies, 2 FC cards and 1 10G-adapter, distributed over ~20 managed systems (mostly 780s and 740s)

Even if they do soft zoning (based on WWNs) there is a chance that you changed to another part of the fabric when you moved the system. In many cases there are several switches ("clusters") and a zone defined on one doesn't necessarily mean it is defined on any other.

In any case i suggest you also check all the connecting cables. There is always the chance of a FC cable broken.

I hope this helps.

bakunin

Sorry no idea :confused:

It almost come down to what x-ray and bakunin has said,
To add more,

Are you not seeing the SAN disks on VIO2 itself or on clients coming from VIO2?
If the answer is latter, I had a personal experience with VIO2, wherein the other SysAdmin just added the virtual fibre dynamically and forget to add it in profile. You may want to run a second pair of eyes over there.

If all is set and you can see those are connected and vfchosts defined, then it could be either faulty FC card or zoning issue.

As an sysadmin you can create virtual adapters and establish a connection between VIO and Client (using NPIV), but how the disk is presented to the system is the task of Storage Admin.

Its very simple, after you have provided WWN's of VIO, there is not much you can do (may be except for pulling the fibre cable from Fabric to Server).

Is error log showing any errors for FC card?

Possibility of power cycling the system. Even though this system has been bought brand new less then a year old.

We did change fibre cables as well.

---------- Post updated at 10:01 AM ---------- Previous update was at 09:51 AM ----------

I have VIO2 booting from SAN, so unfortuantely I can not even start the VIO2 unless I see the boot disks, which has come down to, me posting on this forum for help.
As mentioned previously, I'm using vscsi not vfchosts.

You are definitely right, in regards to providing the WWN's and not much I can do, unless the FC is faulty, but without evidence of faulty adapters, I can't move forward and neither can SAN team.

No errors on anything regarding to FC cards.

---------- Post updated at 11:02 AM ---------- Previous update was at 10:01 AM ----------

UPDATE: SAN team have updated me will the following completed on their end

Reseat cable and SFTP on switch and HBA side
Disable and enable ports
Change port speed and disable/enable ports

Xray - you are right, fcstat comes back immediately (I tested on another server.). But that's probably because I have that ports wwn online on the switch. Where this issue, my wwn is not logged online.

...another idea:

While moving the server, have you used protection caps for the fibre cards?
Maybe there are "dust" on the optical connectors and you need to clean (using a cleaning pen) the faulty adapters.

Regards

No, didn't use any protection caps.
I'm going out tomorrow, so, will definitely try different PCI slots, and clean out any dust in the optical connectors.
Will let you know guys know how it goes.

Thanks

---------- Post updated at 10:05 PM ---------- Previous update was at 12:51 AM ----------

I have ran out of ideas. I went out today, did the SAN Zone support option, and re-inserted the fibre cables on switch and used different ports, and the WWN's didn't come online. I did another wrap plug test of the FC adapters (that have issues) and report came back with No trouble found.
Just going replace the fibre cards and see what happens.

Update: The fibre adapters were perfectly fine ( as suspected ). SAN team moved fibre cables from the original SAN switch it was currently in, into another SAN switch and the luns appeared.
Funnily enough, when we moved one fibre path onto the other SAN switch, the other fibre cables plugged into the original SAN switch brought the luns online. Weird.

Thanks for all the input and help.

First off, many thanks for this follow-up. Getting to know how a problem finally was solved enlarges the knowledge of us all.

I hate to say it, but this sounds suspiciously like a zoning problem. Anyway, i am glad you could resolve it.

bakunin

1 Like