Sun sparc T5220 cannot boot up

Dear all,

My sun sparc T5220 server is with 4 disks, volume 0 is with disk 0 and disk 1 as raid 1, volume 1 is with disk 0 and disk 1. Volume 0 should be installed Solaris 10.
Now when the sun sparc boots up, it falls into the ok prompt and there when i run show-volumes, it shows volume 0 is failed and volume 1 is optimal. Volume 0's primary stated missing and secondary stated out of sync. When i tried to run boot, it seems cannot boot and hang there, and need to send break to quit it.
If in that situation, is it possible to fix Volume 0? is it trying running from the out of sync copy and it is locked or something like that?

We have tried to boot from the Solaris 10 CD and it successfully go to the # prompt. There we run format command, but it states No disk found. No luck

Is there any other thing we can try?
Thank you.

Others can help much better than me, as I have not used any Sparc products in a long time.

What come to mind is fsck:

Cheers

Please clarify if you're using hardware based RAID of some sort or if you're using software based RAID; SVM (a.k.a. meta-disk) or ZFS in Solaris 10.

I would also wonder what disks the OpenBoot firmware sees. But I don't remember the command to have it enumerate. -- A quick web search -- untested -- indicates that show volumes will likely provide some information.

Link - Display Status (show-volumes Command, OBP)

  • https://docs.oracle.com/cd/E19332-01/E24985/z40000081534936.html
1 Like

It should be hardware based RAID.
The openboot firmware sees show-volumes
image

1 Like

See also:

Determining If a Drive Has Failed

Thanks.
But is there a way to fix it?

Did you wait long enough? Normally it should time out, then continue with the healthy disk only. Unless the "healthy" disk is broken, too.

Perhaps it helps to physically remove the failed disk?

This is your problem! While running 'format' from a CD/DVD single user boot sees no disks you have no chance.

You post that you have 4 disks in the system configured as 2 x RAID1 pairs??

If the RAID is implemented in software, 'format' run in single user (CD/DVD boot) should see all 4 disks. If the RAID is implemented by a hardware controller each RAID1 pair should show up as a single drive (the mirror in each pair being hidden by the RAID controller and the system being presented with what looks like a single disk in each pair).

Now, it seems that you know that the RAID1 is implemented by a hardware RAID controller so although one RAID pair has problems, you should still see the healthy disk but you don't. So the RAID controller is not passing any LUNs to the system board whether they're healthy or not. That's the problem. Reseat all disk/power cables.

I would be reseating the RAID controller (if it's not on the mobo) and running the RAID controller management software to examine it closely. Does the RAID controller have a bootable management software suite??

1 Like

If the disks are connected to the on-board hardware raid controller of the T5220 boot from CD/DVD into single user (#) prompt and, what says:

# raidctl -l
1 Like

+1 to physically removing the failed disk.

I've seen a lot of soft failures handled far worse than hard failures. As in a drive is failing but just alive enough to report "I'm here, give me a moment" constantly and causing things to take a LOT longer than they should.

Normally, it's good to know the root problem (root cause) before proposing a fix.

Right now, it seems folks concur it likely a hardware failure.

Do you agree @da01661 ?

it states No RAID volumes found

Since the volume 0 RAID, 1 disk is stated missing and 1 disk is stated out of syn, i have tried to remove the disk alternatively to boot. When i remove the out of syn disk to boot, it stated no bootable file something like that. When i remove the missing disk to boot, it hang there, the same as both disks are there. So I guess it is already booting from the out of syn disk, but the out of syn disk either is locked due to out of syn RAID policy( i guess only??), or the out of syn status made the system file system incomplete.

I also wonder why one disk is stated missing, although it is present there, i also suspect if the controller has problem as your opinion. It seems there is no RAID controller, but it has a disk controller card which responsible for the RAID processing too. But one thing i cannot understand is if the disk controller has problem, why other volume seems no problem.

I have tried to remove the fail disk, but the same result. It seems it is already booting from the other disk in the RAID.

Yes, it also think most likely is hardware problem which made one disk looks missing. But the secondary disk is out of syn status, if can make the secondary disk work, then maybe can boot up the solaris. But not sure why the secondary disk cannot boot up the solaris, either because it is in out of syn status or because actually some system file is out syn or missing, that's what i guess.

Well, in my view, if you have a RAID setup and the root cause of the problem is a bad disk which is causing RAID to fail, then you have two choices:

  • Replace the bad disk (and resync them) in your RAID configuration, or
  • Disable RAID and only use the "good" disk which is working; but out of sync.

Then, you can take the "bad disk" and mount it regularly (not RAID) and further test.

That is how I would proceed.

It's useless to have a RAID if it has failed; so disable RAID until you get to the root cause. One working disk (no RAID) is infinitely better than a failed RAID configuration.

Also, in the future, be weary of any RAID configuration where access to the data is blocked, or a boot fails, if one of the disks in a RAID configurations fails. That is not a reliable RAID setup.

1 Like

Thanks for your suggestion.
Since the machine is very old and is setup by retired staff, and i don't know too much about how Sun sparc's hardware RAID works.
I also want to force it to run in single disk, but it seems one disk is considered missing and one disk is out of syn. I'm thinking either make the missing disk "appears" again or try to use the out syn disk to boot.
I tried to use out syn to boot, but cannot, so not sure if it is because the boot files inside is out syn making it unbootable or there is mechanism that suspend it to use the out syn disk (the suspend mechanism is a guess only, since i don't know how its RAID works, maybe there is no such suspend status). If there is no such suspend concept, then most probably is the boot files corrupted? is it a method to repair it?

Out of sync is normal for the working device if the mirror is out of sync.
After removal of the failed disk, did you power-cycle the box? Or at least did a reset-all at the ok prompt?

2 Likes

See (maybe helpful, maybe not):

@da01661

Did you get this problem sorted out?