I wanted/needed to make a root on ZFS pool out of multiple mirror VDEVs, and since I'm not a ZFS expert, I took a little shortcut.
I recently got a new-to-me server (yay!) and I wanted to do a root-on-ZFS setup on it. I've really enjoyed using ZFS for my data storage pools for a long time. I've also enjoyed the extra functionality that comes with having a bootable system installed on ZFS on my laptop and decided with this upgrade it's time to do the same on my server. Historically I've used RAIDZ for my storage pools. RAIDZ functions almost like a RAID10 but at the ZFS level. It gives you parity so that a certain number of disks can die from your pool and you won't lose any data. It does have a few tradeoffs however*, and for personal preferences I've decided that for the future I would like to have a single ZPool over top of multiple mirror VDEVs. In other words, my main root+storage pool will be made up of two-disk mirrors and can be expanded to include any number of new mirrors I can fit into the machine.
This did present some complications. First of all,
bsdinstall
won't set this up for you automatically (and sure
enough,
in the handbook
it mentions the guided root on ZFS tool will only create a single, top-level
VDEV unless it's a stripe). It will happily let you use RAIDZ for your ZROOT
but not the more custom approach I'm taking. I did however use
bsdinstall
as a shortcut so I wouldn't have to do all of the
partitioning and pool setup manually, and that's what I'm going to document
below. Because I'm totally going to forget how this works the next time I have
to do it.
In my scenario I have an eight-slot, hot-swappable PERC H310 controller that's configured for AHCI passthrough. In other words, all FreeBSD sees is as many disks as I have plugged into the backplane. I'm going to fill it with 6x2TB hard disks which, as I said before, I want to act as three mirrors (two disks each) in a single, bootable, growable ZPool. For starters, I shoved the FreeBSD installer on a flash drive and booted from it. I followed all of the regular steps (setting hostname, getting online, etc.) until I got to the guided root on ZFS disk partitioning setup.
Now here's where I'm going to take the first step on my shortcut. Since there is no option to create the pool of arbitrary mirrors I'm just going to create a pool from a single mirror VDEV of two disks. Later I will expand the pool to include the other two mirrors I had intended for. My selections were as follows:
Everything else was left as a default. Then I followed the installer to completion. At the end, when it asked if I wanted to drop into a shell to do more to the installation, I did.
The installer created the following disk layout for the two disks that I selected.
atc@macon:~ % gpart show
=> 40 3907029088 mfisyspd0 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
=> 40 3907029088 mfisyspd1 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
The installer also created the following ZPool from my single mirror VDEV.
atc@macon:~ % zpool status
pool: zroot
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mfisyspd0p3 ONLINE 0 0 0
mfisyspd1p3 ONLINE 0 0 0
errors: No known data errors
There are a couple of things to take note of here. First of all, both disks in the bootable ZPool have an EFI boot partition. That means they're both a part of (or capable of?) booting the pool. Second, they both have some swap space. Finally, they both have a third partition which is dedicated to ZFS data, and that partition is what got added to my VDEV.
So where do I go from here? I was tempted to just
zpool add mirror ... ...
and just add my other disks to the pool
(actually, I did do this but it rendered the volume unbootable for a
very important reason), but then I wouldn't have those all-important boot
partitions (using whole-disk mirror VDEVS). Instead, I need to manually go
back and re-partition four disks exactly like the first two. Or, since all I
want is two more of what's already been done, I can just clone the partitions
using gpart backup
and restore
! Easy! Here's what I
did for all four remaining disks:
root@macon:~ # gpart backup mfisyspd0 | gpart restore -F mfisyspd2`
Full disclosure, I didn't even think of this as a possibility until I read this Stack Exchange post. This gave me a disk layout like this:
atc@macon:~ % gpart show
=> 40 3907029088 mfisyspd0 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
=> 40 3907029088 mfisyspd1 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
=> 40 3907029088 mfisyspd2 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
=> 40 3907029088 mfisyspd3 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
=> 40 3907029088 mfisyspd4 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
=> 40 3907029088 mfisyspd5 GPT (1.8T)
40 409600 1 efi (200M)
409640 2008 - free - (1.0M)
411648 8388608 2 freebsd-swap (4.0G)
8800256 3898228736 3 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
And to be fair, this makes a lot of logical sense. You don't want a six-disk pool to only be bootable by two of the disks or you're defeating some of the purposes of redundancy. So now I can extend my ZPool to include those last four disks.
This next step may or may not be a requirement. I wanted to overwrite where I
assumed any old ZFS/ZPool metadata might be on my four new disks. This could
just be for nothing and I admit that, but I've run into trouble in the past
where a ZPool wasn't properly exported/destroyed before the drives were
removed for another purpose and when you use those drives in future
zpool import
s, you can see both the new and the old, failed
pools. And, in the previous step I cloned an old ZFS partition many times! So
I did a small dd
on the remaining disks to help me sleep at
night:
root@macon:~ # dd if=/dev/zero of=/dev/mfisyspd2 bs=1M count=100
One final, precautionary step is to write the EFI boot loader to the new disks. In zpool admin handbook it mentions you should do this any time you replace a zroot device, so I'll do it just for safe measure on all four additional disks:
root@macon:~ # gpart bootcode -p /boot/boot1.efifat -i 1 mfisyspd2
Don't forget that the command is different for UEFI and a traditional BIOS. And finally, I can add my new VDEVs:
root@macon:~ # zpool zroot add mirror mfisyspd2p3 mfisyspd3p3
root@macon:~ # zpool zroot add mirror mfisyspd4p3 mfisyspd5p3
And now my pool looks like this:
atc@macon:~ % zpool status
pool: zroot
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mfisyspd0p3 ONLINE 0 0 0
mfisyspd1p3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
mfisyspd2p3 ONLINE 0 0 0
mfisyspd3p3 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
mfisyspd4p3 ONLINE 0 0 0
mfisyspd5p3 ONLINE 0 0 0
errors: No known data errors
Boom. A growable, bootable zroot ZPool. Is it easier than just configuring the
partitions and root on ZFS by hand? Probably not for a BSD veteran. But since
I'm a BSD layman, this is something I can live with pretty easily. At least
until this becomes an option in bsdintall
maybe? At least now I
can add as many more mirrors as I can fit into my system. And it's just as
easy to replace them. This is better for me than my previous RAIDZ, where I
would have to destroy and re-create the pool in order to add more disks to the
VDEV. Now I just create another little mirror and grow the pool and all of my
filesystems just see more storage. And of course, having ZFS for all of my
data makes it super easy to create filesystems on the fly, compress or quota
them, and take snapshots (including the live ZROOT!) and send those snapshots
over the network. Pretty awesome.
* I'm not going to explain why here, but this is a pretty well thought out article that should give you an idea about the pros and cons of RAIDZ versus mirror VDEVs so you can draw your own conclusions.