I thought I was OK, but zpool scrub hangs forever at 20% across multiple cold boots, and importing from oracle solaris 11 live hangs forever... I what do I do now, older versions of open solaris will not import the pool anymore...
---------- Post updated at 07:36 AM ---------- Previous update was at 12:48 AM ----------
Well it's 4AM and now I am getting mad. I think this whole bloody mess is caused by Oracle's 'great new solaris 11' package. My advice, don't touch this piece.
To recap, I wanted to try the best newest Solaris release for my super critical file server, so I downloaded from Oracle what I thought was the right iso. I booted, and just as the boot menu came up I got a phone call. When I came back the thing had automatically installed a new version of solaris over one of the drives in the zpool. (BAD #1 ).
Since I have a raidz2 I wasn't too worried ( at first ) and I booted into my original 2009 opensolaris. However, I got the errors shown in the above posts. I exported and could not re-import my pool. Oracale had done something to that drive breaking my entire pool even though it is a radz2 and should be tolerant of 2 drive failures at the same time. ( BAD #2 ).
Since I could do nothing with my pool with my old OS, I tried on the new Oracle solaris, and could indeed import my pool in degraded state because of the one overwritten drive. Fine, I wanted to scrub everything first ( I don't know if this was wise or not ) so I did zpool scrub, which eventually hung forever at 11%. All access to the drive was similarly hung. Rebooting the machine did not change this situation which seemed increasingly dire ( BAD #3 )
I finally got out of this problem by unplugging drives to fault the pool and rebooting in single user mode. Eventually I was able to stop the scrub with the "zpool scrub -s" command, in single user mode. And when I rebooted, I could access my pool again. My first priority at this point was to back up all data immediately. I began to copy off my most important stuff, but unfortunately before I could copy even a fraction off the file system hung again ( BAD #4 ).
Googling around I found most causes for hanging zpool commands are related to hardware failure, so going on a hunch, I figured Oracle phased out or screwed up the drivers for my disks. I still could not import my pool in the old opensolaris OS, because of whatever the Oracle install wrote on that drive. So I booted in Oracle solaris, in single user mode, and did "zpool offline <pool> <drive>" and it worked! Then I rebooted into good old opensolaris, and imported my pool. It worked!!!!
So at this point it is now more like 5AM and I have backed up most of my critical data, way more than I could before anyway. It appears at this point that I was correct and the drivers for either my motherboard or my hard drive controller card were broken by the Oracle release in a way that let it silently trash my zpool. I have a SIIG 2 drive sata and a MSI n1996 motherboard, not sure which the problem is with, but whichever, it works fine in opensolaris 2009 and previous versions.
I just want to warn people that are not real Solaris experts from even trying this Oracle package. Personally I am migrating to fbsd as soon as I can...