summaryrefslogtreecommitdiff
path: root/posts/unix/2021-01-15-root-on-zfs-a-zpool-of-mirror-vdevs-the-easy-way.html
diff options
context:
space:
mode:
Diffstat (limited to 'posts/unix/2021-01-15-root-on-zfs-a-zpool-of-mirror-vdevs-the-easy-way.html')
-rw-r--r--posts/unix/2021-01-15-root-on-zfs-a-zpool-of-mirror-vdevs-the-easy-way.html375
1 files changed, 375 insertions, 0 deletions
diff --git a/posts/unix/2021-01-15-root-on-zfs-a-zpool-of-mirror-vdevs-the-easy-way.html b/posts/unix/2021-01-15-root-on-zfs-a-zpool-of-mirror-vdevs-the-easy-way.html
new file mode 100644
index 0000000..3fd0309
--- /dev/null
+++ b/posts/unix/2021-01-15-root-on-zfs-a-zpool-of-mirror-vdevs-the-easy-way.html
@@ -0,0 +1,375 @@
+<!DOCTYPE html>
+<html>
+ <head>
+ <link rel="stylesheet" href="/includes/stylesheet.css" />
+ <meta charset="utf-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
+ <meta
+ property="og:description"
+ content="The World Wide Web pages of Adam Carpenter"
+ />
+ <meta
+ property="og:image"
+ content="https://nextcloud.53hor.net/s/iBGxB7P3BKRbj9P/preview"
+ />
+ <meta property="og:site_name" content="53hor.net" />
+ <meta
+ property="og:title"
+ content="Root on ZFS: A ZPool of Mirror VDEVs The Easy Way"
+ />
+ <meta property="og:type" content="website" />
+ <meta property="og:url" content="https://www.53hor.net" />
+ <title>53hornet ➙ Root on ZFS: A ZPool of Mirror VDEVs The Easy Way</title>
+ </head>
+
+ <body>
+ <nav>
+ <ul>
+ <li>
+ <a href="/">
+ <img src="/includes/icons/home-roof.svg" />
+ Home
+ </a>
+ </li>
+ <li>
+ <a href="/info.html">
+ <img src="/includes/icons/information-variant.svg" />
+ Info
+ </a>
+ </li>
+ <li>
+ <a href="https://git.53hor.net">
+ <img src="/includes/icons/git.svg" />
+ Repos
+ </a>
+ </li>
+ <li>
+ <a href="/hosted.html">
+ <img src="/includes/icons/desktop-tower.svg" />
+ Hosted
+ </a>
+ </li>
+ <li>
+ <a type="application/rss+xml" href="/rss.xml">
+ <img src="/includes/icons/rss.svg" />
+ RSS
+ </a>
+ </li>
+ </ul>
+ </nav>
+
+ <article>
+ <h1>Root on ZFS: A ZPool of Mirror VDEVs</h1>
+
+ <p class="description">
+ I wanted/needed to make a root on ZFS pool out of multiple mirror VDEVs,
+ and since I'm not a ZFS expert, I took a little shortcut.
+ </p>
+
+ <p>
+ I recently got a new-to-me server (yay!) and I wanted to do a
+ root-on-ZFS setup on it. I've really enjoyed using ZFS for my data
+ storage pools for a long time. I've also enjoyed the extra functionality
+ that comes with having a bootable system installed on ZFS on my laptop
+ and decided with this upgrade it's time to do the same on my server.
+ Historically I've used RAIDZ for my storage pools. RAIDZ functions
+ almost like a RAID10 but at the ZFS level. It gives you parity so that a
+ certain number of disks can die from your pool and you won't lose any
+ data. It does have a few tradeoffs however*, and for personal
+ preferences I've decided that for the future I would like to have a
+ single ZPool over top of multiple mirror VDEVs. In other words, my main
+ root+storage pool will be made up of two-disk mirrors and can be
+ expanded to include any number of new mirrors I can fit into the
+ machine.
+ </p>
+
+ <p>
+ This did present some complications. First of all,
+ <code>bsdinstall</code> won't set this up for you automatically (and
+ sure enough,
+ <a
+ href="https://www.freebsd.org/doc/handbook/bsdinstall-partitioning.html"
+ >in the handbook</a
+ >
+ it mentions the guided root on ZFS tool will only create a single,
+ top-level VDEV unless it's a stripe). It will happily let you use RAIDZ
+ for your ZROOT but not the more custom approach I'm taking. I did
+ however use
+ <code>bsdinstall</code> as a shortcut so I wouldn't have to do all of
+ the partitioning and pool setup manually, and that's what I'm going to
+ document below. Because I'm totally going to forget how this works the
+ next time I have to do it.
+ </p>
+
+ <p>
+ In my scenario I have an eight-slot, hot-swappable PERC H310 controller
+ that's configured for AHCI passthrough. In other words, all FreeBSD sees
+ is as many disks as I have plugged into the backplane. I'm going to fill
+ it with 6x2TB hard disks which, as I said before, I want to act as three
+ mirrors (two disks each) in a single, bootable, growable ZPool. For
+ starters, I shoved the FreeBSD installer on a flash drive and booted
+ from it. I followed all of the regular steps (setting hostname, getting
+ online, etc.) until I got to the guided root on ZFS disk partitioning
+ setup.
+ </p>
+
+ <p>
+ Now here's where I'm going to take the first step on my shortcut. Since
+ there is no option to create the pool of arbitrary mirrors I'm just
+ going to create a pool from a single mirror VDEV of two disks. Later I
+ will expand the pool to include the other two mirrors I had intended
+ for. My selections were as follows:
+ </p>
+
+ <ul>
+ <li>Pool Type/Disks: mirror mfisyspd0 mfisyspd1</li>
+ <li>Pool Name: zroot</li>
+ <li>Partition Scheme: GPT (EFI)</li>
+ <li>Swap Size: 4g</li>
+ </ul>
+
+ <p>
+ Everything else was left as a default. Then I followed the installer to
+ completion. At the end, when it asked if I wanted to drop into a shell
+ to do more to the installation, I did.
+ </p>
+
+ <p>
+ The installer created the following disk layout for the two disks that I
+ selected.
+ </p>
+
+ <pre>
+<code>
+atc@macon:~ % gpart show
+=> 40 3907029088 mfisyspd0 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+
+=> 40 3907029088 mfisyspd1 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+</code>
+</pre>
+
+ <p>
+ The installer also created the following ZPool from my single mirror
+ VDEV.
+ </p>
+
+ <pre>
+<code>
+atc@macon:~ % zpool status
+ pool: zroot
+ state: ONLINE
+ scan: none requested
+config:
+
+ NAME STATE READ WRITE CKSUM
+ zroot ONLINE 0 0 0
+ mirror-0 ONLINE 0 0 0
+ mfisyspd0p3 ONLINE 0 0 0
+ mfisyspd1p3 ONLINE 0 0 0
+
+errors: No known data errors
+</code>
+</pre>
+
+ <p>
+ There are a couple of things to take note of here. First of all,
+ <em>both</em> disks in the bootable ZPool have an EFI boot partition.
+ That means they're both a part of (or capable of?) booting the pool.
+ Second, they both have some swap space. Finally, they both have a third
+ partition which is dedicated to ZFS data, and that partition is what got
+ added to my VDEV.
+ </p>
+
+ <p>
+ So where do I go from here? I was tempted to just
+ <code>zpool add mirror ... ...</code> and just add my other disks to the
+ pool (actually, I <em>did</em> do this but it rendered the volume
+ unbootable for a very important reason), but then I wouldn't have those
+ all-important boot partitions (using whole-disk mirror VDEVS). Instead,
+ I need to manually go back and re-partition four disks exactly like the
+ first two. Or, since all I want is two more of what's already been done,
+ I can just clone the partitions using <code>gpart backup</code> and
+ <code>restore</code>! Easy! Here's what I did for all four remaining
+ disks:
+ </p>
+
+ <pre>
+<code>
+root@macon:~ # gpart backup mfisyspd0 | gpart restore -F mfisyspd2`
+</code>
+</pre>
+
+ <p>
+ Full disclosure, I didn't even think of this as a possibility
+ <a
+ href="ihttps://unix.stackexchange.com/questions/472147/replacing-disk-when-using-freebsd-zfs-zroot-zfs-on-partition#472175"
+ >until I read this Stack Exchange post</a
+ >. This gave me a disk layout like this:
+ </p>
+
+ <pre>
+<code>
+atc@macon:~ % gpart show
+=> 40 3907029088 mfisyspd0 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+
+=> 40 3907029088 mfisyspd1 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+
+=> 40 3907029088 mfisyspd2 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+
+=> 40 3907029088 mfisyspd3 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+
+=> 40 3907029088 mfisyspd4 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+
+=> 40 3907029088 mfisyspd5 GPT (1.8T)
+ 40 409600 1 efi (200M)
+ 409640 2008 - free - (1.0M)
+ 411648 8388608 2 freebsd-swap (4.0G)
+ 8800256 3898228736 3 freebsd-zfs (1.8T)
+ 3907028992 136 - free - (68K)
+</code>
+</pre>
+
+ <p>
+ And to be fair, this makes a lot of logical sense. You don't want a
+ six-disk pool to only be bootable by two of the disks or you're
+ defeating some of the purposes of redundancy. So now I can extend my
+ ZPool to include those last four disks.
+ </p>
+
+ <p>
+ This next step may or may not be a requirement. I wanted to overwrite
+ where I assumed any old ZFS/ZPool metadata might be on my four new
+ disks. This could just be for nothing and I admit that, but I've run
+ into trouble in the past where a ZPool wasn't properly
+ exported/destroyed before the drives were removed for another purpose
+ and when you use those drives in future
+ <code>zpool import</code>s, you can see both the new and the old, failed
+ pools. And, in the previous step I cloned an old ZFS partition many
+ times! So I did a small <code>dd</code> on the remaining disks to help
+ me sleep at night:
+ </p>
+
+ <pre>
+<code>
+root@macon:~ # dd if=/dev/zero of=/dev/mfisyspd2 bs=1M count=100
+</code>
+</pre>
+
+ <p>
+ One final, precautionary step is to write the EFI boot loader to the new
+ disks. In
+ <a href="https://www.freebsd.org/doc/handbook/zfs-zpool.html"
+ >zpool admin handbook</a
+ >
+ it mentions you should do this any time you <em>replace</em> a zroot
+ device, so I'll do it just for safe measure on all four additional
+ disks:
+ </p>
+
+ <pre>
+<code>
+root@macon:~ # gpart bootcode -p /boot/boot1.efifat -i 1 mfisyspd2
+</code>
+</pre>
+
+ <p>
+ Don't forget that the command is different for UEFI and a traditional
+ BIOS. And finally, I can add my new VDEVs:
+ </p>
+
+ <pre>
+<code>
+root@macon:~ # zpool zroot add mirror mfisyspd2p3 mfisyspd3p3
+root@macon:~ # zpool zroot add mirror mfisyspd4p3 mfisyspd5p3
+</code>
+</pre>
+
+ <p>And now my pool looks like this:</p>
+
+ <pre>
+<code>
+atc@macon:~ % zpool status
+ pool: zroot
+ state: ONLINE
+ scan: none requested
+config:
+
+ NAME STATE READ WRITE CKSUM
+ zroot ONLINE 0 0 0
+ mirror-0 ONLINE 0 0 0
+ mfisyspd0p3 ONLINE 0 0 0
+ mfisyspd1p3 ONLINE 0 0 0
+ mirror-1 ONLINE 0 0 0
+ mfisyspd2p3 ONLINE 0 0 0
+ mfisyspd3p3 ONLINE 0 0 0
+ mirror-2 ONLINE 0 0 0
+ mfisyspd4p3 ONLINE 0 0 0
+ mfisyspd5p3 ONLINE 0 0 0
+
+errors: No known data errors
+</code>
+</pre>
+
+ <p>
+ Boom. A growable, bootable zroot ZPool. Is it easier than just
+ configuring the partitions and root on ZFS by hand? Probably not for a
+ BSD veteran. But since I'm a BSD layman, this is something I can live
+ with pretty easily. At least until this becomes an option in
+ <code>bsdintall</code> maybe? At least now I can add as many more
+ mirrors as I can fit into my system. And it's just as easy to replace
+ them. This is better for me than my previous RAIDZ, where I would have
+ to destroy and re-create the pool in order to add more disks to the
+ VDEV. Now I just create another little mirror and grow the pool and all
+ of my filesystems just see more storage. And of course, having ZFS for
+ all of my data makes it super easy to create filesystems on the fly,
+ compress or quota them, and take snapshots (including the live ZROOT!)
+ and send those snapshots over the network. Pretty awesome.
+ </p>
+
+ <p>
+ * I'm not going to explain why here, but
+ <a
+ href="http://www.openoid.net/zfs-you-should-use-mirror-vdevs-not-raidz/"
+ >this is a pretty well thought out article</a
+ >
+ that should give you an idea about the pros and cons of RAIDZ versus
+ mirror VDEVs so you can draw your own conclusions.
+ </p>
+ </article>
+ </body>
+</html>