summaryrefslogtreecommitdiff
path: root/posts/unix/2021-01-15-root-on-zfs-a-zpool-of-mirror-vdevs-the-easy-way.html
blob: 6f515f33ff981be04fdaec52d0ffba2697d101cd (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
<!DOCTYPE html>
<html>
  <head>
    <link rel="stylesheet" href="/includes/stylesheet.css" />
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <meta
      property="og:description"
      content="The World Wide Web pages of Adam Carpenter"
    />
    <meta
      property="og:image"
      content="https://nextcloud.53hor.net/index.php/s/Nx9e7iHbw4t99wo/preview"
    />
    <meta property="og:site_name" content="53hor.net" />
    <meta
      property="og:title"
      content="Root on ZFS: A ZPool of Mirror VDEVs The Easy Way"
    />
    <meta property="og:type" content="website" />
    <meta property="og:url" content="https://www.53hor.net" />
    <title>53hornet ➙ Root on ZFS: A ZPool of Mirror VDEVs The Easy Way</title>
  </head>

  <body>
    <nav>
      <ul>
        <li>
          <a href="/">
            <img src="/includes/icons/home-roof.svg" />
            Home
          </a>
        </li>
        <li>
          <a href="/info.html">
            <img src="/includes/icons/information-variant.svg" />
            Info
          </a>
        </li>
        <li>
          <a href="https://git.53hor.net">
            <img src="/includes/icons/git.svg" />
            Repos
          </a>
        </li>
        <li>
          <a href="/hosted.html">
            <img src="/includes/icons/desktop-tower.svg" />
            Hosted
          </a>
        </li>
        <li>
          <a type="application/rss+xml" href="/rss.xml">
            <img src="/includes/icons/rss.svg" />
            RSS
          </a>
        </li>
      </ul>
    </nav>

    <article>
      <h1>Root on ZFS: A ZPool of Mirror VDEVs</h1>

      <p class="description">
        I wanted/needed to make a root on ZFS pool out of multiple mirror VDEVs,
        and since I'm not a ZFS expert, I took a little shortcut.
      </p>

      <p>
        I recently got a new-to-me server (yay!) and I wanted to do a
        root-on-ZFS setup on it. I've really enjoyed using ZFS for my data
        storage pools for a long time. I've also enjoyed the extra functionality
        that comes with having a bootable system installed on ZFS on my laptop
        and decided with this upgrade it's time to do the same on my server.
        Historically I've used RAIDZ for my storage pools. RAIDZ functions
        almost like a RAID10 but at the ZFS level. It gives you parity so that a
        certain number of disks can die from your pool and you won't lose any
        data. It does have a few tradeoffs however*, and for personal
        preferences I've decided that for the future I would like to have a
        single ZPool over top of multiple mirror VDEVs. In other words, my main
        root+storage pool will be made up of two-disk mirrors and can be
        expanded to include any number of new mirrors I can fit into the
        machine.
      </p>

      <p>
        This did present some complications. First of all,
        <code>bsdinstall</code> won't set this up for you automatically (and
        sure enough,
        <a
          href="https://www.freebsd.org/doc/handbook/bsdinstall-partitioning.html"
          >in the handbook</a
        >
        it mentions the guided root on ZFS tool will only create a single,
        top-level VDEV unless it's a stripe). It will happily let you use RAIDZ
        for your ZROOT but not the more custom approach I'm taking. I did
        however use
        <code>bsdinstall</code> as a shortcut so I wouldn't have to do all of
        the partitioning and pool setup manually, and that's what I'm going to
        document below. Because I'm totally going to forget how this works the
        next time I have to do it.
      </p>

      <p>
        In my scenario I have an eight-slot, hot-swappable PERC H310 controller
        that's configured for AHCI passthrough. In other words, all FreeBSD sees
        is as many disks as I have plugged into the backplane. I'm going to fill
        it with 6x2TB hard disks which, as I said before, I want to act as three
        mirrors (two disks each) in a single, bootable, growable ZPool. For
        starters, I shoved the FreeBSD installer on a flash drive and booted
        from it. I followed all of the regular steps (setting hostname, getting
        online, etc.) until I got to the guided root on ZFS disk partitioning
        setup.
      </p>

      <p>
        Now here's where I'm going to take the first step on my shortcut. Since
        there is no option to create the pool of arbitrary mirrors I'm just
        going to create a pool from a single mirror VDEV of two disks. Later I
        will expand the pool to include the other two mirrors I had intended
        for. My selections were as follows:
      </p>

      <ul>
        <li>Pool Type/Disks: mirror mfisyspd0 mfisyspd1</li>
        <li>Pool Name: zroot</li>
        <li>Partition Scheme: GPT (EFI)</li>
        <li>Swap Size: 4g</li>
      </ul>

      <p>
        Everything else was left as a default. Then I followed the installer to
        completion. At the end, when it asked if I wanted to drop into a shell
        to do more to the installation, I did.
      </p>

      <p>
        The installer created the following disk layout for the two disks that I
        selected.
      </p>

      <pre>
<code>
atc@macon:~ % gpart show
=>        40  3907029088  mfisyspd0  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)

=>        40  3907029088  mfisyspd1  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)
</code>
</pre>

      <p>
        The installer also created the following ZPool from my single mirror
        VDEV.
      </p>

      <pre>
<code>
atc@macon:~ % zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

	NAME             STATE     READ WRITE CKSUM
	zroot            ONLINE       0     0     0
	  mirror-0       ONLINE       0     0     0
	    mfisyspd0p3  ONLINE       0     0     0
	    mfisyspd1p3  ONLINE       0     0     0

errors: No known data errors
</code>
</pre>

      <p>
        There are a couple of things to take note of here. First of all,
        <em>both</em> disks in the bootable ZPool have an EFI boot partition.
        That means they're both a part of (or capable of?) booting the pool.
        Second, they both have some swap space. Finally, they both have a third
        partition which is dedicated to ZFS data, and that partition is what got
        added to my VDEV.
      </p>

      <p>
        So where do I go from here? I was tempted to just
        <code>zpool add mirror ... ...</code> and just add my other disks to the
        pool (actually, I <em>did</em> do this but it rendered the volume
        unbootable for a very important reason), but then I wouldn't have those
        all-important boot partitions (using whole-disk mirror VDEVS). Instead,
        I need to manually go back and re-partition four disks exactly like the
        first two. Or, since all I want is two more of what's already been done,
        I can just clone the partitions using <code>gpart backup</code> and
        <code>restore</code>! Easy! Here's what I did for all four remaining
        disks:
      </p>

      <pre>
<code>
root@macon:~ # gpart backup mfisyspd0 | gpart restore -F mfisyspd2`
</code>
</pre>

      <p>
        Full disclosure, I didn't even think of this as a possibility
        <a
          href="ihttps://unix.stackexchange.com/questions/472147/replacing-disk-when-using-freebsd-zfs-zroot-zfs-on-partition#472175"
          >until I read this Stack Exchange post</a
        >. This gave me a disk layout like this:
      </p>

      <pre>
<code>
atc@macon:~ % gpart show
=>        40  3907029088  mfisyspd0  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)

=>        40  3907029088  mfisyspd1  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)

=>        40  3907029088  mfisyspd2  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)

=>        40  3907029088  mfisyspd3  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)

=>        40  3907029088  mfisyspd4  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)

=>        40  3907029088  mfisyspd5  GPT  (1.8T)
          40      409600          1  efi  (200M)
      409640        2008             - free -  (1.0M)
      411648     8388608          2  freebsd-swap  (4.0G)
     8800256  3898228736          3  freebsd-zfs  (1.8T)
  3907028992         136             - free -  (68K)
</code>
</pre>

      <p>
        And to be fair, this makes a lot of logical sense. You don't want a
        six-disk pool to only be bootable by two of the disks or you're
        defeating some of the purposes of redundancy. So now I can extend my
        ZPool to include those last four disks.
      </p>

      <p>
        This next step may or may not be a requirement. I wanted to overwrite
        where I assumed any old ZFS/ZPool metadata might be on my four new
        disks. This could just be for nothing and I admit that, but I've run
        into trouble in the past where a ZPool wasn't properly
        exported/destroyed before the drives were removed for another purpose
        and when you use those drives in future
        <code>zpool import</code>s, you can see both the new and the old, failed
        pools. And, in the previous step I cloned an old ZFS partition many
        times! So I did a small <code>dd</code> on the remaining disks to help
        me sleep at night:
      </p>

      <pre>
<code>
root@macon:~ # dd if=/dev/zero of=/dev/mfisyspd2 bs=1M count=100
</code>
</pre>

      <p>
        One final, precautionary step is to write the EFI boot loader to the new
        disks. In
        <a href="https://www.freebsd.org/doc/handbook/zfs-zpool.html"
          >zpool admin handbook</a
        >
        it mentions you should do this any time you <em>replace</em> a zroot
        device, so I'll do it just for safe measure on all four additional
        disks:
      </p>

      <pre>
<code>
root@macon:~ # gpart bootcode -p /boot/boot1.efifat -i 1 mfisyspd2
</code>
</pre>

      <p>
        Don't forget that the command is different for UEFI and a traditional
        BIOS. And finally, I can add my new VDEVs:
      </p>

      <pre>
<code>
root@macon:~ # zpool zroot add mirror mfisyspd2p3 mfisyspd3p3
root@macon:~ # zpool zroot add mirror mfisyspd4p3 mfisyspd5p3
</code>
</pre>

      <p>And now my pool looks like this:</p>

      <pre>
<code>
atc@macon:~ % zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

	NAME             STATE     READ WRITE CKSUM
	zroot            ONLINE       0     0     0
	  mirror-0       ONLINE       0     0     0
	    mfisyspd0p3  ONLINE       0     0     0
	    mfisyspd1p3  ONLINE       0     0     0
	  mirror-1       ONLINE       0     0     0
	    mfisyspd2p3  ONLINE       0     0     0
	    mfisyspd3p3  ONLINE       0     0     0
	  mirror-2       ONLINE       0     0     0
	    mfisyspd4p3  ONLINE       0     0     0
	    mfisyspd5p3  ONLINE       0     0     0

errors: No known data errors
</code>
</pre>

      <p>
        Boom. A growable, bootable zroot ZPool. Is it easier than just
        configuring the partitions and root on ZFS by hand? Probably not for a
        BSD veteran. But since I'm a BSD layman, this is something I can live
        with pretty easily. At least until this becomes an option in
        <code>bsdintall</code> maybe? At least now I can add as many more
        mirrors as I can fit into my system. And it's just as easy to replace
        them. This is better for me than my previous RAIDZ, where I would have
        to destroy and re-create the pool in order to add more disks to the
        VDEV. Now I just create another little mirror and grow the pool and all
        of my filesystems just see more storage. And of course, having ZFS for
        all of my data makes it super easy to create filesystems on the fly,
        compress or quota them, and take snapshots (including the live ZROOT!)
        and send those snapshots over the network. Pretty awesome.
      </p>

      <p>
        * I'm not going to explain why here, but
        <a
          href="http://www.openoid.net/zfs-you-should-use-mirror-vdevs-not-raidz/"
          >this is a pretty well thought out article</a
        >
        that should give you an idea about the pros and cons of RAIDZ versus
        mirror VDEVs so you can draw your own conclusions.
      </p>
    </article>
  </body>
</html>