summaryrefslogtreecommitdiff
path: root/posts/2021-12-15-rescuing-freebsd-the-unix-way.php
blob: f552c62ab618247ba8a73db5f9e89a21a7912404 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
<h1>Rescuing FreeBSD, the UNIX Way (And a Free Upgrade Too!)</h1>

<div class="description">
	<p>
		I did it, I finally effed up my FreeBSD install.
	</p>

	<p>
		It all started with trying to build a custom kernel. <a href="https://docs.freebsd.org/en/books/handbook/mirrors/#git">I used Git to checkout the latest stable source code</a>, but realized I used the wrong branch name. So, I figured I'd interrupt the clone and just remove the <code>/usr/src</code> directory (the standard place to put FreeBSD's source code) and start over. So I changed into <code>/usr/src</code> and ran <code>rm -r *</code>. Only I wasn't in <code>/usr/src</code>, I was in <code>/usr</code>.
	</p>
</div>

<p>
	By the time I realized what was happening and cancelled the operation, I had already wiped out all of <code>/usr/bin</code> and an unknown number of other files and directories. My window manager (i3) fell on its face, and the status bar threw the "I can't run any of these commands" message. Turns out there's a lot of useful stuff I expected to live in <code>/bin</code> that actually lives in <code>/usr/bin</code>! <code>ssh(1)</code> was gone, <code>tar(1)</code> was gone. Even <code>doas(1)</code>! I really, truly, thoroughly effed up my install.
</p>

<p>
	Oh God, what about my home directory? Turns out it was still there, along with most (if not all of my files). I'm not enough of an idiot to not have a home snapshot, so I log in as root (<code>doas</code> is gone), and <code>zfs rollback</code> to the rescue. Now my primary concern is getting that home directory out of there, because I didn't feel like restoring it from my cloud backup if the rest of my recovery went bad. But I didn't have <code>ssh(1)</code> or <code>scp(1)</code> to clone it to my server. I did have ZFS' send and receive functionality but I figured I'd take the easy way out and use my unscathed <code>rclone</code> to SCP it to freedom. Pretty sure my data shamed and gloated at me as it reached its lifeboat.
</p>

<p>
	So now I could start to fearlessly think about un-effing my install. This is where most people (previously including myself) would suck it up and pull out the install media, try to remember all of the customization steps they took to bring the system back to its current working state, and restore user data. But I'm not the man I once was. I've been playing this game long enough. I don't go crawling back to the dusty install media when something as trivial as critical system files go missing or corrupt. I know what I did wrong and I can think of several creative ways to fix it. Say it with me:
	<img src="https://nextcloud.53hor.net/index.php/s/Gj8GZxLdegkJgG5/download" alt="it's a unix system" title="Except ackshually because BSD was derived from research UNIX and has a long *nix heritage as well as a history of adhering to the Unix Philosophy and POSIX. Linux is not UNIX, but FreeBSD is as close as you're gonna get." />
</p>

<p>
	First of all, my entire system (sans <code>/usr/bin</code> is still somewhat operational. I have access to a root shell (and my X session with a browser) so I'm in pretty good shape. I am lacking some very basic core utilities but I might be able to get them back without even rebooting. I don't have any system-wide snapshots to restore from but I do have another running FreeBSD 13.0-RELEASE system on my network: my server. <code>rclone</code> worked to move data over there in an emergency, so I'll use that to copy my coreutils back where they belong. And it worked. <sup><a href="#2">[2]</a></sup>

	<img src="https://nextcloud.53hor.net/index.php/s/rzaqYo3N2SSwQ2e/download" alt="back in business" title="Literally whenever I get something to work right" />
</p>


<p>
	Now came the hard part. Un-effing everything I <em>didn't</em> know was missing or broken. Who knows what else got removed during that operation. I can reinstall my entire package tree pretty easily, so I'm not that worried about anything missing from <code>/usr/local</code>. Maybe I have one or two config files in <code>/usr/local/etc</code> that I can live without. I know <code>/usr/home</code> is safe and restored. So all that's left is stuff like <code>lib, sbin, include, lib32, share</code> and a few others that aren't very unique to my system (packages notwithstanding).
</p>

<p>
	Here's where the YOLO part begins. I was already in the middle of building my system from source to track 13.0-STABLE, instead of RELEASE. So instead of using a rescue CD to copy just the right files back or completely reinstalling my system, I'll just upgrade my system in-place to track STABLE. The install/upgrade/switching process is well-documented, and there are already mergetools responsible for making sure that all of the new artifacts go exactly where they're supposed to (over top of your old or broken existing ones).
</p>

<p>
	<a href="https://docs.freebsd.org/en/books/handbook/cutting-edge/#makeworld">Time for the Handbook</a>. I'll start where I left off when everything imploded. Clean out <code>/usr/src</code> and clone the source tree, but with an absolute path this time.
</p>

<pre>
<code>
# rm -r /usr/src/*
# git clone -b stable/13 https://git.FreeBSD.org/src.git /usr/src
</code>
</pre>

<p>
	And then start compiling the world (userland, services, utilities) and the kernel. I'm going to use the GENERIC kernel for now so I can just get back up and running. This part takes a really really long time.
</p>

<pre>
<code>
# make -j 4 buildworld buildkernel
</code>
</pre>

<p>
	It took literally half the day. Poor little dual-core i5. It was already well on its way to completing when I realized I could have done this on my 8-core/16-thread server<sup><a href="#1">[1]</a></sup>. "Oh, so you're reckless <em>and</em> stupid." Don't worry, I redeem myself later. Now I can actually install the new kernel and reboot into it. This is required before installing the world (userland).
</p>

<pre>
<code>
# make installkernel
...
# reboot
</code>
</pre>

<p>
	After rebooting I can check out my new version.
</p>

<pre>
<code>
# freebsd-version
13.0-RELEASE
</code>
</pre>

<p>
	Hmm, probably can't use <code>freebsd-version(1)</code> because it's tied to the userland and <code>freebsd-update(8)</code>. Let's try <code>uname(1)</code>:
</p>

<pre>
<code>
# uname -r
13.0-STABLE
</code>
</pre>

<p>
	Success. Kernel is rebuilt, reinstalled, and tracking STABLE. Now it's time to install/upgrade everything else. Side note: this is one of the cool things about FreeBSD. It's a complete operating system, not just a kernel or just a userland with a compiler and some utilities. All of the pieces were made to fit together instead of being glued together into a distro.
</p>

<pre>
<code>
# make installworld
</code>
</pre>

<p>
	This took a little bit longer, but from the output I could see that all of my important, potentially-missing files were being restored. Libraries, core utilities, applications, daemons, config, and man pages all got put back in their proper place. My system is totally back to life and I'm confident that I'm running an un-maimed FreeBSD 13.0-STABLE:
</p>

<pre>
<code>
% freebsd-version
13.0-STABLE
</code>
</pre>

<p>
	Now comes the sanity checking part of this job. I'm actually running a newer system now than I was before the upgrade. One of the components that was upgraded was ZFS, and with every major ZFS upgrade, I'm going to reinstall my root-on-ZFS bootloader before I reboot:
</p>

<pre>
<code>
# gpart bootcode -p /boot/gptzfsboot -i 1 ada0
...
# reboot
</code>
</pre>

<p>
	This may be unecessary, but it ensures that my root-on-ZFS will load correctly after a reboot with the new ZFS. And to be fair, it is mentioned in the <a href="https://cgit.freebsd.org/src/tree/UPDATING?h=stable/13"><code>UPDATING</code></a> guide in the source.
</p>

<p>
	After a reboot, I've got one more sanity check. FreeBSD comes with <code>etcupdate(8)</code>, which you can use to manage merging upgrades with local changes to your <code>/etc, /usr/local/etc</code> system config. If you run <code>etcupdate diff</code> you can see a diff of all of your customizations. This is so good, I can't believe something like this doesn't exist on your typical Linux distro. Maybe it does and I just never realized it, but I'm betting they're all just different enough to not be able to share something like this. Anyway, after reviewing the diff, I applied any changes/merges by running <code>etcupdate</code>.
</p>

<p>
	Now for one last bit of housekeeping, and this comes straight from the handbook. After an upgrade, the world installation leaves behind old libraries and files that the new system doesn't need but old applications or ports built against an older target might still require. To get rid of them, you can use the <code>Makefile</code> directives in <code>/usr/src</code>
</p>

<pre>
<code>
# make check-old check-old-libs
</code>
</pre>

<p>
	After reviewing the list and ensuring you don't need those files, you can clean them up with
</p>

<pre>
<code>
# make BATCH_DELETE_OLD_FILES=yes delete-old delete-old-libs
</code>
</pre>

<p>
	And finally, I'll force install my entire package tree to make sure any third-party missing files are reinstalled:
</p>

<pre>
<code>
# pkg leaf | xargs pkg install -f
</code>
</pre>

<p>
	Redemption. I went from attempting to customize my kernel to annihilating <code>/usr</code> to restoring my entire system by building from FreeBSD's source tree via <code>git(1)</code> and <code>make(1)</code>. And I got a free upgrade out of it! Moving forward, I'm running slightly frequent automatic full-system snapshots. It should make it a lot easier to rescue accidental deletions of system files. I'm also going to take the time to learn more about the rescue disk process using the FreeBSD installer image. All told, not too bad for a disaster-turned-learning-experience.
</p>

<hr>

<ol>
	<li id="1">
		I actually did move my FreeBSD source to my server to let it pull changes and do automatic builds. Turns out it can chunk out the whole world and kernel in about 1.5 hours.
	</li>
	<li id="2">
		I now know you can use the readonly <em>/rescue</em> utilities (such as <em>/rescue/tar</em>) to help with this.
	</li>
</ol>