GRUB, it is time we broke up. It’s not you, it’s me. Okay, it’s you. The last 15+ years have some great (read: painful) memories. But it is time to call it quits.
Red Hat Linux (not RHEL) deprecated LILO for version 9 (PDF; hat tip: Spot). This means that Fedora has used GRUB as its bootloader since the very first release: Fedora Core 1.
GRUB was designed for a world where bootloaders had to locate a Linux kernel on a filesystem. This meant it needed support for all the filesystems anyone might conceivably use. It was also built for a world where dual-booting meant having a bootloader implemented menu to choose between operating systems.
The UEFI world we live in today looks nothing like this. UEFI requires support for a standard filesystem. This filesystem, which for all intents and purposes duplicates the contents of /boot, is required on every Linux system which boots UEFI. So UEFI loads the bootloader from the UEFI partition and then the bootloader loads the kernel from the /boot partition.
Did you know that UEFI can just boot the kernel directly? It can!
The situation, however, is much worse than just duplicated effort. With the exception of Apple hardware, practically all UEFI implementations ship with Secure Boot and a TPM enabled by default. Only appropriately signed UEFI code will be run. This means we now introduce a [shim][shim] which is signed. This, in turn, loads GRUB from the UEFI partition.
This means that our boot process now looks like this:
- UEFI filesystem
- /boot filesystem
It gets worse. Microsoft OEMs are now enabling BitLocker by default. BitLocker seals (encrypts) the Windows partition to the TPM PCRs. This means that if the boot process changes (and you have no backup of the key), you can’t decrypt your data. So remember that great boot menu that GRUB provided so we can dual-boot with Windows? It can never work, cryptographically.
The user experience of this process is particularly painful. Users who manage to get Fedora installed will see a nice GRUB menu entry for Windows. But if they select it, they are immediately greeted with a terrifying message telling them that the boot configuration has changed and their encrypted data is inaccessible.
To recap, where Secure Boot is enabled (pretty much all Intel hardware), we must use the boot menu provided by UEFI. If we don’t, the PCRs of the TPM have unknown hashes and anything sealed to the boot state will fail to decrypt.
The good news is that Intel provides a reference implementation of UEFI, and it includes pretty much everything we’d ever need. This means that most vendors get it pretty much correct as well. OEMs are even using these facilities for their own (hidden) recovery partitions.
So why not just have UEFI boot the kernel directly? There are still some drawbacks to this approach.
First, it requires signing every build of the kernel. This is definitely undesirable since kernels are updated pretty regularly.
Second, every kernel upgrade would mean a write to UEFI NVRAM. There are some concerns about the longevity of the hardware under such frequent UEFI writes.
Third, it exposes kernels as a menu option in UEFI. This menu typically contains operating systems, not individual kernels, which results in a poor user experience. Most users don’t need to care about what kernel they boot. There should be a bootloader which loads the most recently installed kernel and falls back to older kernels if the new kernels fail to boot. All of this can be done without a menu (unless the user presses a key).
Fortunately, systemd already implements precisely such a bootloader. Previously, this bootloader was called gummiboot. But it has since been merged into the systemd repository as systemd-boot.
With systemd-boot, our boot process can look like this:
- UEFI filesystem
It would even be possible (though, not necessarily desirable) to sign systemd-boot directly and get rid of the shim.
In short, we need to stop trying to make GRUB work in our current context and switch to something designed specifically for the needs of our modern systems. We already ship this code in systemd. Further, systemd already ships a tool for managing the bootloader. We just need to enable it in Anaconda and test it.
Who’s with me!?
P.S. - It would be very helpful if we could get some good documentation on manually migrating from GRUB to systemd-boot. This would at least enable the testing of this setup by brave users.