summaryrefslogtreecommitdiff
path: root/posts/2021-12-15-rescuing-freebsd-the-unix-way.php
diff options
context:
space:
mode:
author53hornet <atc@53hor.net>2021-12-16 22:08:00 -0500
committer53hornet <atc@53hor.net>2021-12-16 22:08:00 -0500
commitdd04ad281d9123296be1a904cf647f8e7a232d3f (patch)
treef00d56f908dcd33ddf29c8a31bba8e831496fac9 /posts/2021-12-15-rescuing-freebsd-the-unix-way.php
parentc3b77872285c05ed42359434d5c306304b8dfaeb (diff)
download53hor-dd04ad281d9123296be1a904cf647f8e7a232d3f.tar.xz
53hor-dd04ad281d9123296be1a904cf647f8e7a232d3f.zip
feat: rescue post, colors, centered titles
Diffstat (limited to 'posts/2021-12-15-rescuing-freebsd-the-unix-way.php')
-rw-r--r--posts/2021-12-15-rescuing-freebsd-the-unix-way.php121
1 files changed, 121 insertions, 0 deletions
diff --git a/posts/2021-12-15-rescuing-freebsd-the-unix-way.php b/posts/2021-12-15-rescuing-freebsd-the-unix-way.php
new file mode 100644
index 0000000..0ffa9c5
--- /dev/null
+++ b/posts/2021-12-15-rescuing-freebsd-the-unix-way.php
@@ -0,0 +1,121 @@
+<h1>Rescuing FreeBSD, the UNIX Way!</h1>
+
+<div class="description">
+ <p>
+ I did it, I finally effed up my FreeBSD install.
+ </p>
+
+ <p>
+ It all started with trying to build a custom kernel. <a href="https://docs.freebsd.org/en/books/handbook/mirrors/#git">I used Git to checkout the latest stable source code</a>, but realized I used the wrong branch name. So, I figured I'd interrupt the clone and just remove the <code>/usr/src</code> directory (the standard place to put FreeBSD's source code) and start over. So I changed into <code>/usr/src</code> and ran <code>rm -r *</code>. Only I wasn't in <code>/usr/src</code>, I was in <code>/usr</code>.
+ </p>
+</div>
+
+<p>
+ By the time I realized what was happening and cancelled the operation, I had already wiped out all of <code>/usr/bin</code> and an unknown number of other files and directories. My window manager (i3) fell on its face, and the status bar threw the "I can't run any of these commands" message. Turns out there's a lot of useful stuff I expected to live in <code>/bin</code> that actually lives in <code>/usr/bin</code>! <code>ssh(1)</code> was gone, <code>tar(1)</code> was gone. Even <code>doas(1)</code>! I really, truly, thoroughly effed up my install.
+</p>
+
+<p>
+ Oh God, what about my home directory? Turns out it was still there, along with most (if not all of my files). I'm not enough of an idiot to not have a home snapshot, so I log in as root (<code>doas</code> is gone), and <code>zfs rollback</code> to the rescue. Now my primary concern is getting that home directory out of there, because I didn't feel like restoring it from my cloud backup if the rest of my recovery went bad. But I didn't have <code>ssh(1)</code> or <code>scp(1)</code> to clone it to my server. I did have ZFS' send and receive functionality but I figured I'd take the easy way out and use my unscathed <code>rclone</code> to SCP it to freedom. Pretty sure my data shamed and gloated at me as it reached its lifeboat.
+</p>
+
+<p>
+ So now I could start to fearlessly think about un-effing my install. This is where most people (previously including myself) would suck it up, start from scratch with a USB installer, try to remember all of the customization steps they took to bring the system back to its current working state, and restore user data. But I'm not the man I once was. I've been playing this game long enough. I don't go crawling back to the dusty install media when something as trivial as critical system files go missing or corrupt. I know what I did wrong and I can think of several creative ways to fix it. Say it with me:
+ <img src="https://nextcloud.53hor.net/index.php/s/Gj8GZxLdegkJgG5/preview" />
+</p>
+
+First of all, my entire system (sans <code>/usr/bin</code> is still somewhat operational. I have access to a root shell (and my X session with a browser) so I'm in pretty good shape. I am lacking some very basic core utilities but I might be able to get them back without even rebooting.
+
+I don't have any system-wide snapshots to restore from (believe me I do now). I do have another running FreeBSD 13.0-RELEASE system on my network though: my server. <code>rclone</code> worked to move data over there in an emergency, so I'll use that to copy my coreutils back where they belong. And it worked. Restarting i3 was like that scene in Jurassic Park: "we're back in business!". Now came the hard part. Un-effing everything I <em>didn't</em> know was missing or broken.
+
+I can reinstall my entire package tree pretty easily, so I'm not that worried about anything missing from <code>/usr/local</code>. I'm a <code>pkg leaf | xargs pkg install -f</code> away from completely restoring those. Maybe I have one or two config files in <code>/usr/local/etc</code> that I can live without. I know <code>/usr/home</code> is safe and restored. So all that's left is stuff like <code>lib, sbin, include, lib32, share</code> and a few others that aren't very unique to my system (packages notwithstanding).
+
+Here's where the YOLO part begins. I was already in the middle of building my system from source to track FreeBSD 13.0-STABLE, instead of RELEASE. So instead of using a rescue CD to copy just the right files back or completely reinstalling my system, I'll just <code>make buildworld</code> and upgrade/replace my system in-place to track STABLE. The install/upgrade/switching process is well-documented, and there are already difftools/mergetools responsible for making sure that all of the new artifacts go exactly where they're supposed to (overtop of your old or broken existing ones).
+
+https://docs.freebsd.org/en/books/handbook/cutting-edge/#makeworld
+
+So let's follow the Handbook shall we? I'll start where I left off when everything imploded. Clean out <code>/usr/src</code> and clone the source tree.
+
+<pre>
+<code>
+# rm -r /usr/src/*
+# git clone -b stable/13 https://git.FreeBSD.org/src.git /usr/src
+</code>
+</pre>
+
+And then start by compiling the world (userland) and the kernel. I'm going to use the GENERIC kernel for now so I can just get back up and running.
+
+<pre>
+<code>
+# make -j 4 buildworld buildkernel
+</code>
+</pre>
+
+It took literally hours. Poor little dual-core i5. It was already well on its way to completing when I realized I could have done this on my 8-core/16-thread server. "Oh, so you're reckless <em>and</em> stupid." Don't worry, I redeem myself later. Now I can actually install the new kernel and reboot into it. This is required before installing the world (userland).
+
+<pre>
+<code>
+# make installkernel
+...
+# reboot
+</code>
+</pre>
+
+After rebooting I can check out my new version.
+
+<pre>
+<code>
+# freebsd-version
+13.0-RELEASE
+</code>
+</pre>
+
+Hmm, probably can't use <code>freebsd-version</code> because it's tied to the userland and <code>freebsd-update</code>. Let's try <code>uname</code>:
+
+<pre>
+<code>
+# uname -r
+13.0-STABLE
+</code>
+</pre>
+
+Success. Kernel is rebuilt, reinstalled, and tracking STABLE. Now it's time to install/upgrade everything else. Side note: this is one of the cool things about FreeBSD. It's a complete operating system, not just a kernel or a userland. All of the pieces were made to fit together instead of being glued together into a distro.
+
+<pre>
+<code>
+# make installworld
+</code>
+</pre>
+
+This took a little bit longer, but from the output I could see that all of my important, potentially-missing binaries were being installed. Libraries, core utilities, applications, daemons, and man pages all got put back in their proper place. My system is totally back to life nad I'm confident that I'm running an un-maimed FreeBSD 13.0-STABLE.
+
+Now comes the sanity checking part of this job. I'm actually running a newer system now than I was before the upgrade. One of the components that was upgraded was ZFS, and with every major ZFS upgrade, I'm going to reinstall my ZROOT bootloader:
+
+<pre>
+<code>
+# gpart bootcode -p /boot/gptzfsboot -i 1 ada0
+...
+# reboot
+</code>
+</pre>
+
+This may be unecessary, but it ensures that my root-on-ZFS will load correctly after a reboot with the new ZFS. And to be fair, it is mentioned in the <a href="https://cgit.freebsd.org/src/tree/UPDATING?h=stable/13"><code>UPDATING</code></a> guide in the source.
+
+After a reboot, I've got one more sanity check. FreeBSD comes with <code>etcupdate(8)</code>, which you can use to manage merging upgrades with local changes to your <code>/etc /usr/local/etc</code> system config. If you run <code>etcupdate diff</code> you can see a diff of all of your customized system config. This is so good, I can't believe something like this doesn't exist on your typical Linux distro. Maybe it does and I just never realized it, but I'm betting they're all just different enough to not be able to share something like this. Anyway, after reviewing the diff, I applied any changes/merges by running <code>etcupdate</code>.
+
+Now for one last bit of housekeeping, and this comes straight from the handbook. After an upgrade, the world installation leaves behind old libraries and files that the new system doesn't need but old applications or ports built against an older target might still require. To get rid of them, you can use the <code>Makefile</code> directives in <code>/usr/src</code>
+
+<pre>
+<code>
+# make check-old check-old-libs
+</code>
+</pre>
+
+After reviewing the list and ensuring you don't need those files, you can clean them up with
+
+<pre>
+<code>
+# make BATCH_DELETE_OLD_FILES=yes delete-old delete-old-libs
+</code>
+</pre>
+
+Redemption. I went from attempting to customize my kernel to annihilating <code>/usr</code> to restoring my entire system by building from FreeBSD's source tree via <code>git</code> and <code>make</code>. And I got an upgrade in the process too! Moving forward, I'm running slightly frequent automatic full-system snapshots. It should make it a lot easier to rescue accidental deletions of system files. I'm also going to take the time to learn more about the rescue disk process using the FreeBSD installer image. All told, not too bad for something that could have gone a lot worse.