Rescuing FreeBSD, the UNIX Way (And a Free Upgrade Too!)

I did it, I finally effed up my FreeBSD install.

It all started with trying to build a custom kernel. I used Git to checkout the latest stable source code, but realized I used the wrong branch name. So, I figured I'd interrupt the clone and just remove the /usr/src directory (the standard place to put FreeBSD's source code) and start over. So I changed into /usr/src and ran rm -r *. Only I wasn't in /usr/src, I was in /usr.

By the time I realized what was happening and cancelled the operation, I had already wiped out all of /usr/bin and an unknown number of other files and directories. My window manager (i3) fell on its face, and the status bar threw the "I can't run any of these commands" message. Turns out there's a lot of useful stuff I expected to live in /bin that actually lives in /usr/bin! ssh(1) was gone, tar(1) was gone. Even doas(1)! I really, truly, thoroughly effed up my install.

Oh God, what about my home directory? Turns out it was still there, along with most (if not all of my files). I'm not enough of an idiot to not have a home snapshot, so I log in as root (doas is gone), and zfs rollback to the rescue. Now my primary concern is getting that home directory out of there, because I didn't feel like restoring it from my cloud backup if the rest of my recovery went bad. But I didn't have ssh(1) or scp(1) to clone it to my server. I did have ZFS' send and receive functionality but I figured I'd take the easy way out and use my unscathed rclone to SCP it to freedom. Pretty sure my data shamed and gloated at me as it reached its lifeboat.

So now I could start to fearlessly think about un-effing my install. This is where most people (previously including myself) would suck it up, start from scratch with a USB installer, try to remember all of the customization steps they took to bring the system back to its current working state, and restore user data. But I'm not the man I once was. I've been playing this game long enough. I don't go crawling back to the dusty install media when something as trivial as critical system files go missing or corrupt. I know what I did wrong and I can think of several creative ways to fix it. Say it with me: it's a unix system

First of all, my entire system (sans /usr/bin is still somewhat operational. I have access to a root shell (and my X session with a browser) so I'm in pretty good shape. I am lacking some very basic core utilities but I might be able to get them back without even rebooting. I don't have any system-wide snapshots to restore from but I do have another running FreeBSD 13.0-RELEASE system on my network: my server. rclone worked to move data over there in an emergency, so I'll use that to copy my coreutils back where they belong. And it worked. back in business

Now came the hard part. Un-effing everything I didn't know was missing or broken. Who knows what else got removed during that operation. I can reinstall my entire package tree pretty easily, so I'm not that worried about anything missing from /usr/local. Maybe I have one or two config files in /usr/local/etc that I can live without. I know /usr/home is safe and restored. So all that's left is stuff like lib, sbin, include, lib32, share and a few others that aren't very unique to my system (packages notwithstanding).

Here's where the YOLO part begins. I was already in the middle of building my system from source to track 13.0-STABLE, instead of RELEASE. So instead of using a rescue CD to copy just the right files back or completely reinstalling my system, I'll just upgrade my system in-place to track STABLE. The install/upgrade/switching process is well-documented, and there are already mergetools responsible for making sure that all of the new artifacts go exactly where they're supposed to (over top of your old or broken existing ones).

Time for the Handbook. I'll start where I left off when everything imploded. Clean out /usr/src and clone the source tree, but with an absolute path this time.


# rm -r /usr/src/*
# git clone -b stable/13 https://git.FreeBSD.org/src.git /usr/src

And then start compiling the world (userland, services, utilities) and the kernel. I'm going to use the GENERIC kernel for now so I can just get back up and running. This part takes a really really long time.


# make -j 4 buildworld buildkernel

It took literally half the day. Poor little dual-core i5. It was already well on its way to completing when I realized I could have done this on my 8-core/16-thread server. "Oh, so you're reckless and stupid." Don't worry, I redeem myself later. Now I can actually install the new kernel and reboot into it. This is required before installing the world (userland).


# make installkernel
...
# reboot

After rebooting I can check out my new version.


# freebsd-version
13.0-RELEASE

Hmm, probably can't use freebsd-version(1) because it's tied to the userland and freebsd-update(1). Let's try uname(1):


# uname -r
13.0-STABLE

Success. Kernel is rebuilt, reinstalled, and tracking STABLE. Now it's time to install/upgrade everything else. Side note: this is one of the cool things about FreeBSD. It's a complete operating system, not just a kernel or just a userland with a compiler and some utilities. All of the pieces were made to fit together instead of being glued together into a distro.


# make installworld

This took a little bit longer, but from the output I could see that all of my important, potentially-missing files were being restored. Libraries, core utilities, applications, daemons, config, and man pages all got put back in their proper place. My system is totally back to life and I'm confident that I'm running an un-maimed FreeBSD 13.0-STABLE:


% freebsd-version
13.0-STABLE

Now comes the sanity checking part of this job. I'm actually running a newer system now than I was before the upgrade. One of the components that was upgraded was ZFS, and with every major ZFS upgrade, I'm going to reinstall my root-on-ZFS bootloader before I reboot:


# gpart bootcode -p /boot/gptzfsboot -i 1 ada0
...
# reboot

This may be unecessary, but it ensures that my root-on-ZFS will load correctly after a reboot with the new ZFS. And to be fair, it is mentioned in the UPDATING guide in the source.

After a reboot, I've got one more sanity check. FreeBSD comes with etcupdate(8), which you can use to manage merging upgrades with local changes to your /etc, /usr/local/etc system config. If you run etcupdate diff you can see a diff of all of your customizations. This is so good, I can't believe something like this doesn't exist on your typical Linux distro. Maybe it does and I just never realized it, but I'm betting they're all just different enough to not be able to share something like this. Anyway, after reviewing the diff, I applied any changes/merges by running etcupdate.

Now for one last bit of housekeeping, and this comes straight from the handbook. After an upgrade, the world installation leaves behind old libraries and files that the new system doesn't need but old applications or ports built against an older target might still require. To get rid of them, you can use the Makefile directives in /usr/src


# make check-old check-old-libs

After reviewing the list and ensuring you don't need those files, you can clean them up with


# make BATCH_DELETE_OLD_FILES=yes delete-old delete-old-libs

And finally, I'll force install my entire package tree to make sure any third-party missing files are reinstalled:


# pkg leaf | xargs pkg install -f

Redemption. I went from attempting to customize my kernel to annihilating /usr to restoring my entire system by building from FreeBSD's source tree via git(1) and make(1). And I got a free upgrade out of it! Moving forward, I'm running slightly frequent automatic full-system snapshots. It should make it a lot easier to rescue accidental deletions of system files. I'm also going to take the time to learn more about the rescue disk process using the FreeBSD installer image. All told, not too bad for a disaster-turned-learning-experience.