Multidisk Filesystems: A Comparison

With the death of Kryder’s law[1], hard drive density has been crawling along slowly in comparison to the exponential growth of yesterdecade. The largest 3.5" hard drives available for purchase are only 16TB, with drives up to 20TB slated for release later this year; a measly 1.3x per year from the 2TB drives of 2012. Data continues to grow, however, with the rise of Deep Learning on ever larger datasets, Big Data, and growing archival efforts[2].

As such, it makes sense to look at solutions for utilizing multiple hard drives for increased capacity, speed, and reliability. Traditionally, the choice has been over different levels of RAID implemented in software or hardware, but today, there are many different filesystems built from the ground up to support multiple disks. This guide is meant to lay out the many different choices in a simple, easy to digest format.

This post is still a work in progress and will keep being updated as the filesystem landscape changes.

At a Glance

Replication Parity Resizing2 Snapshots Compression Tiering Integrity Stable?
ZFS Yes 1/2/3 No Yes Yes Limited Yes Yes
mdadm Yes 1/2 Yes Depends1 Depends1 Depends1 No Yes
bcachefs Yes WIP No? WIP Yes Yes Yes Beta
btrfs Yes Buggy No? Yes Yes Yes Yes Yes

1 When composed with other tools.
2 Refers specifically to adding additional (identical) drives, one at a time, to an existing parity filesystem.

Details

ZFS

ZFS is the arguably the most feature-rich single filesystem, and a stable one at that. It supports 1-3 disk parity, snapshots, compression, and integrity checks. Many NAS systems, such as FreeNAS, use ZFS.

However, it does have several major weaknesses. ZFS parity vdevs cannot be expanded, forcing upgrades to either add more parity vdevs (in RAID 0, essentially) or replace the entire pool at once—something that, for home consumers, can become too expensive. While the implementation of pool expansion is currently in progress, it has been in progress for several years now. ZFS also has very limited storage tiering: the main form of SSD caching is L2ARC, which places high demands on RAM. L2ARC also isn’t persistent across reboots (yet).

mdadm

MDADM does RAID and just RAID. MDADM works at the block level, so many other tools like bcache and LVM can be composed with it to provide additional features. This does add additional moving parts, though, increasing the chance of failure. Additionally, filesystems on top of mdadm have no way of coordinating with it, and so if any bitrot occurs, filesystems with integrity checking will not be able to repair the damaged data using mdadm’s redundancy.

bcachefs

Bcachefs is the new challenger. Still under development, bcachefs is nonetheless shaping up to be a very promising next-gen filesystem with plans for almost all of the major features. The development is active, although nearly all of the development seems to be done by Kent Overstreet himself.

btrfs

Btrfs is another feature rich-filesystem. However, btrfs parity is buggy, and there have been many reports of btrfs eating data. With proper backups (which you should be keeping anyways!) and avoiding the troublesome features, though, this shouldn’t be a big issue for non-production systems.


  1. Kryder’s law is essentially the Moore’s law of storage density, stipulating an exponential growth in areal storage density of magnetic disks. ↩︎

  2. Not to mention linus ISO collectors, of whom there are a surprisingly large number. ↩︎

...