The musings and important information storage shed of Matt Kulka. I'll write about quirky things about Gentoo, Solaris and probably even Mac OS X or things dealing with systems administration in general as I encounter them at my daily job or in my limited free-time. Yes, even some Apple fanboyism too!

Search

Blog Roll

July 2010
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

XML Feeds

Lengthy boot-up after crash, thanks ZFS?

One of my X4500s recently had a little lockup. It acts as an NFS file server and was pretty active I'm sure at the time. Unfortunately, being unresponsive isn't an option. So I gave it a power cycle. Losing a major file server is bad enough but Solaris 10 is a bit of a black box on boot-up and when ZFS is in the mix, it just gets bad. Over an hour on and I'm still staring at a 'Mounting' message rather than a login prompt.

SunOS Release 5.10 Version Generic_138889-03 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: xyz
Reading ZFS config: done.
Mounting ZFS filesystems: (1/6)

Nothing to do but wait. I've done some searching but I can't find much information on what Solaris is actually doing at this point. I'm assuming some consistency check but it would sure be nice if it were either a) more verbose and/or b) done after going into multi-user mode. Anybody else ever had similar trouble with their X4500s?

UPDATE: I did get in it by rebooting with the "-m milestone=none" boot arguments. After that, remove /etc/zfs/zpool.cache. Then you can go and "svcadm milestone all". Now I have a working system but the pool is a bit sketchy. The pool that is the problem wants to resilver immediately and it won't mount any ZFS filesystems on the pool. Ugh.

UPDATE #2: After a couple of days poking Sun support, a software engineer looked at our broken boxen and indeed did find a bug to patch. It looks like a device that went away and came back may have prevented the pool from being able to be mounted by locking something somewhere. We have been provided with some interim fixes. I'm not sure if the numbers will matter to anyone since they're not publicly available yet I don't think but, we were provided with T139580-03 and IDR140222-12. After patching and rebooting, we were able to detach the ghost device and mount the filesystems successfully.

posted by Matt | 03/15/09 | 10:57:28 pm | 1715 views | Hastily filed in Solaris
PermalinkPermalinkLeave a comment »Send a trackback »

0101010101001010101110101010101011100101010101010100111000111010101011100001010101010101101101010111000110101011001011110101010100101000111010101001110101010101010111101010111011010101001001111011011010011011111010111101001011011101010001110010101010100011110101010101111010101100010010101

Trackback address for this post

Trackback URL (right click and copy shortcut/link location)

No feedback yet

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)

bottom corner