Case study: recovery of a corrupted 12 TB multi-device pool

<- Back

Case study: recovery of a corrupted 12 TB multi-device pool

salt4034

Comments (47)

yjftsjthsd-h
> This is not a bug report. [...] The goal is constructive, not a complaint.Er, I appreciate trying to be constructive, but in what possible situation is it not a bug that a power cycle can lose the pool? And if it's not technically a "bug" because BTRFS officially specifies that it can fail like that, why is that not in big bold text at the start of any docs on it? 'Cuz that's kind of a big deal for users to know.EDIT: From the longer write-up:> Initial damage. A hard power cycle interrupted a commit at generation 18958 to 18959. Both DUP copies of several metadata blocks were written with inconsistent parent and child generations.Did the author disable safety mechanisms for that to happen? I'm coming from being more familiar with ZFS, but I would have expected BTRFS to also use a CoW model where it wasn't possible to have multiple inconsistent metadata blocks in a way that didn't just revert you to the last fully-good commit. If it does that by default but there's a way to disable that protection in the name of improving performance, that would significantly change my view of this whole thing.
throwaway270925
> A hard power cycle on a 3 device pool (data single, metadata DUP, DM-SMR disks) left the extent tree and free space tree in a state that no native repair path could resolve.As a ZFS wrangler by day:People in this thread seem to happily shit on btrfs here but this seems to be very much not like a sane, resilient configuration no matter the FS. Just something to keep in mind.
harshreality
Using DUP as the metadata profile sounds insane.Changing the metadata profile to at least raid1 (raid1, raid1c3, raid1c4) is a good idea, especially for anyone, against recommendations, using raid5 or raid6 for a btrfs array (raid1c3 is more appropriate for raid6). That would make it very difficult for metadata to get corrupted, which is the lion's share of the higher-impact problems with raid5/6 btrfs.check: btrfs fi df <mountpoint> convert metadata: btrfs balance start -mconvert=raid1c3,soft <mountpoint> (make sure it's -mconvert — m is for metadata — not -dconvert which would switch profiles for data, messing up your array)
Retr0id
This is obviously LLM output, but perhaps LLM output that corresponds to a real scenario. It's plausible that Claude was able to autonomously recover a corrupted fs, but I would not trust its "insights" by default. I'd love to see a btrfs dev's take on this!
jamesnorden
People swear btrfs is "safe" now, but I've personally been bitten by data corruption more than once, so I stay away from it now.
c-c-c-c-c
Added to my list of reasons to never use btrfs in production.
duskdozer
Welp. Guess I need to figure out another fs to use for a few drives in a nonraid pool I haven't gotten around to setting up yet. I forget why zfs seemed out. xfs?
stinkbeetle
> Case study: recovery of a severely corrupted 12 TB multi-device pool, plus constructive gap analysis and reference tool set #1107Please don't be btrfs please don't be btrfs please don't be btrfs...
anon
undefined
blae
oh great here comes all the zfs fanboys to shit on btrfs again with made up stories of corruption
phoronixrly
To theal author: did you continue using btrfs after this ordeal? An FS that will not eat (all) your data upon a hard powercycle only at the cost of 14 custom C tools is a hard pass from me no matter how many distros try to push it down my throat as 'production-ready'...Also, impressive work!
devnotes77
[dead]
weiyong1024
[dead]