One of my X4500s recently had a little lockup. It acts as an NFS file server and was pretty active I'm sure at the time. Unfortunately, being unresponsive isn't an option. So I gave it a power cycle. Losing a major file server is bad enough but Solaris 10 is a bit of a black box on boot-up and when ZFS is in the mix, it just gets bad. Over an hour on and I'm still staring at a 'Mounting' message rather than a login prompt.
SunOS Release 5.10 Version Generic_138889-03 64-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: xyz Reading ZFS config: done. Mounting ZFS filesystems: (1/6)
Nothing to do but wait. I've done some searching but I can't find much information on what Solaris is actually doing at this point. I'm assuming some consistency check but it would sure be nice if it were either a) more verbose and/or b) done after going into multi-user mode. Anybody else ever had similar trouble with their X4500s?
UPDATE: I did get in it by rebooting with the "-m milestone=none" boot arguments. After that, remove /etc/zfs/zpool.cache. Then you can go and "svcadm milestone all". Now I have a working system but the pool is a bit sketchy. The pool that is the problem wants to resilver immediately and it won't mount any ZFS filesystems on the pool. Ugh.
UPDATE #2: After a couple of days poking Sun support, a software engineer looked at our broken boxen and indeed did find a bug to patch. It looks like a device that went away and came back may have prevented the pool from being able to be mounted by locking something somewhere. We have been provided with some interim fixes. I'm not sure if the numbers will matter to anyone since they're not publicly available yet I don't think but, we were provided with T139580-03 and IDR140222-12. After patching and rebooting, we were able to detach the ghost device and mount the filesystems successfully.
Trackback address for this post
Trackback URL (right click and copy shortcut/link location)
No feedback yet
Leave a comment
