I made a couple serious errors dealing with my ZFS pools the other day (while misreading some online advice in fixing an error) and accidentally “created over” an existing 2-drive mirrored pool named
backup. (Yes, I used the
-f option after it complained. And now I know never to do this again.)
In any case, I happen to have taken out a 3rd mirrored drive from the same pool a few months back, as it was getting old and I didn’t want to wait for it to start to fail. So, I thought that I could swap this drive in and use it to restore the pool. (I’d just be missing out on the past few months of backups, which is mostly what this pool is used for.)
However, I don’t seem to be able to import the pool with this single old drive. At first, I thought it might have had to do with the name conflict with the new
backup pool I accidentally created (and then destroyed). But even when trying to import via GUID, I get nothing.
Here’s the output from zdb -l /dev/sdb1 (which is the third drive)
labels = 0 1 2 3
Thus, the drive and pool data on the drive seem to be intact, according to zdb. However, importing the pool (even with
-F) just gets an “cannot import… no such pool available” error. I tried using the various GUIDs in the above info too (since I wasn’t sure with GUID was the relevant one), but none of those commands (e.g.,
zpool import 3936176493905234028) gets anything other than the “no such pool available” message.
I have installed a new version of my Linux OS since I removed that drive, so I thought using the old
zpool.cache file I managed to recover from the old OS might do something. But the command
zpool import -c zpool.cache just gives:
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
backup UNAVAIL insufficient replicas
mirror-0 UNAVAIL insufficient replicas
sdd1 FAULTED corrupted data
sdc1 FAULTED corrupted data
Which is somewhat to be expected. Those are the two disks where the pool was overwritten by my create command. However, sdb1 isn’t listed as a potential drive there — probably because I removed it from the pool after I took the disk out. Nevertheless, I think I have an intact copy of old mirrored data on sdb1, and zdb agrees. Why won’t it import?
Any suggestions on what else to try? Other diagnostic commands to run?
Note: I tried asking about this over at Server Fault (see link for more details about my situation), but I didn’t get any feedback and realized the specific Linux implementation may be important in figuring out how to resolve this. I would sincerely appreciate any advice or suggestions.
UPDATE: I think I may have found the problem. I thought that I had removed the spare drive before I had issued a
detach command. And the fact that I was still seeing label information (when other online sources seem to indicate
detach destroys the pool metadata) seemed to confirm that. I note that I’m able to simply type
zdb -l backup and get label info (and get uberblock info with
-u), so zfs seems to see the pool even without explicitly pointing to the device. It just doesn’t want to import it for some reason.
However, I’m no longer certain about the
detach status. I came upon this old thread about recovering a ZFS pool from a detached mirror, and it makes a cryptic reference to
txg having a value of zero. There are also references elsewhere to uberblocks being zeroed out upon a
Well, the uberblock to my
backup pool does list
txg = 0 (while an active zpool I have elsewhere has large numbers in this field, not zero). And while there is an existing uberblock, there’s only one, with the others on
backup listed as “invalid.” Unfortunately, I can’t seem to find much documentation of anything coming out of
zdb easily available online.
I assume that means the spare third drive was detached? Can anyone confirm my interpretation? However, if the drive data is otherwise intact, is there any way to recover from it? While some advice online suggests a detached mirror is unrecoverable without resilvering, the thread I linked above has code for Solaris that seems to do a rather simple function to trick the label into thinking the uberblock is fine. Further poking around found me an updated Solaris version of this utility from only three years ago.
Assuming my understanding is correct and that my third mirror was detached, can I attempt a similar uberblock label fix in Linux? Is my only option to attempt to rewrite the Solaris code so that it ports to Linux? (I’m not sure I’m up to that.)
Honestly, given multiple references to scenarios like this online, I’m surprised at the lack of reasonable data recovery tools for ZFS. It seems there are finally some options for basic data recovery for common problems (including a possibility for recovering a pool that was written over by a
create command; this doesn’t appear to be likely to work for me), but other than this one-off script for Solaris, I don’t see anything for dealing with detached devices. It’s very frustrating to realize that there are at least a dozen reasons why ZFS pools may fail to import (sometimes for trivial things that could be easily recoverable), and little in the way of troubleshooting, proper error codes, or documentation.
Again, any help, thoughts, or suggestions would be appreciated. Even if someone could recommend a better place to ask about this, I’d really appreciate it.
Get this bounty!!!