ZFS on Linux

ZFS is a fantastic filesystem developed by Sun. Compared to other filesystems, it’s quite interesting as it combines both a filesystem and a logical volume manager. This allows you to get great flexibility, features and performance. It supports things like integrated snapshots, native NFSv4 ACL support and clever data integrity checking.

I’m now running a HP ProLiant MicroServer N36L which is a small NAS unit containing a 4-bay SATA enclosure. It has a low-performance AMD CPU, and comes with 1GB RAM and a 250GB harddisk. I’ve upgraded mine to 4GB of RAM and 4 x 2TB Seagate Barracuda drives.

The benefit of these units are that they’re a standard x86 machine allowing you to easily install any OS you like. They’re also really cheap and often have cash-back promotions.

I bought mine when I was in the UK and I brought it back with me to Australia. I waited until I got back to upgrade it so save me the trouble of shipping the extra harddisks on the ship.

In this post, I’ll document how to easily install ZFS on Debian Wheezy and some basic ZFS commands you’ll need to get started.

Installation

UPDATE: ZFS on Linux now has their own Debian Wheezy repository! http://zfsonlinux.org/debian.html

Install the ZFS packages

# apt-get install debian-zfs

This should use DKMS to build some new modules specific to your running kernel and install all the required packages.

Pull the new module into the kernel
# modprobe zfs

If all went well, you should see that spl and zfs have been loaded into the kernel.

 

Prepare disks

ZFS works best if you give it full access to your disks. I’m not going to run ZFS on my root filesystem, so this makes things much simpler.

Find our ZFS disks. We use the disk ID’s instead of the standard /dev/sdX naming because it’s more stable.
# ls /dev/disk/by-id/ata-*
lrwxrwxrwx 1 root root 9 Jan 21 19:18 /dev/disk/by-id/ata-ST2000DM001-1CH164_Z1E1GYH5 -> ../../sdd
lrwxrwxrwx 1 root root 9 Jan 21 08:55 /dev/disk/by-id/ata-ST2000DM001-9YN164_Z1E2ACRM -> ../../sda
lrwxrwxrwx 1 root root 9 Jan 21 08:55 /dev/disk/by-id/ata-ST2000DM001-9YN164_Z1F1SHN4 -> ../../sdb

Create partition tables on the disks so we can use them in a zpool:
# parted /dev/disk/by-id/ata-ST2000DM001-9YN164_Z1E2ACRM mklabel gpt
# parted /dev/disk/by-id/ata-ST2000DM001-9YN164_Z1F1SHN4 mklabel gpt
# parted /dev/disk/by-id/ata-ST2000DM001-1CH164_Z1E1GYH5 mklabel gpt

 

Create a new pool

ZFS uses the concept of pools in a similar way to how LVM would handle volume groups.

Create a pool called mypool, with the initial member being a RAIDZ composed of the remaining three drives.
# zpool create -m none -o ashift=12 mypool raidz /dev/disk/by-id/ata-ST2000DM001-1CH164_Z1E1GYH5/dev/disk/by-id/ata-ST2000DM001-9YN164_Z1E2ACRM/dev/disk/by-id/ata-ST2000DM001-9YN164_Z1F1SHN4

RAIDZ is a little like RAID-5. I’m using RAID-Z1, meaning that from a 3-disk pool, I can lose one disk while maintaining the data access.

NOTE: Unlike RAID, once you build your RAIDZ, you cannot add new individual disks. It’s a long story.

The -m none means that we don’t want to specify a mount point for this pool yet.

The -o ashift=12 forces ZFS to use 4K sectors instead of 512 byte sectors. Many new drives use 4K sectors, but lie to the OS about it for ‘compatability’ reasons. My first ZFS filesystem used the 512-byte sectors in the beginning, and I had shocking performance (~10Mb/s write).

See http://zfsonlinux.org/faq.html#HowDoesZFSonLinuxHandlesAdvacedFormatDrives for more information about it.

# zpool list
NAME   SIZE ALLOC FREE  CAP DEDUP HEALTH ALTROOT
mypool 5.44T 1.26T 4.18T 23% 1.00x ONLINE -

Disable atime for a small I/O boost
# zfs set atime=off mypool

Deduplication is probably not worth the CPU overheard on my NAS.
# zfs set dedup=off mypool

Our pool is now ready for use.

 

Create some filesystems

Create our documents filesystem, mount and share it by NFS
# zfs create mypool/documents
# zfs set mountpoint=/mnt/documents mypool/documents
# zfs set sharenfs=on mypool/documents

Create our photos filesystem, mount and share it by NFS
# zfs create mypool/photos
# zfs set mountpoint=/mnt/photos mypool/photos
# zfs set sharenfs=on mypool/photos

Photos are important, so keep two copies of them around
# zfs set copies=2 mypool/photos

Documents are really important, so we’ll keep three copies of them on disk
# zfs set copies=3 mypool/documents

Documents are mostly text, so we’ll compress them.
# zfs set compression=on mypool/documents

Scrub

ZFS pools should be scrubbed at least once a week. It helps balance the data across the disks in your pool and to fix up any data integrity errors it might find.
# zpool scrub <pool>

To do automatic scrubbing once a week, set the following line in your root crontab
# crontab -e
...
30 19 * * 5 zpool scrub <pool>
...

Coming soon is a follow-up to this post with some disk fail/recovery steps.

Posted in Linux by Andy Botting at October 7th, 2013.

9 Responses to “ZFS on Linux”

  1. Shane says:

    Very interesting! I love the fine grain replication control!

    So is zfs considerred stable/production ready?

  2. Andy says:

    Sounds like there are a few people running it in production.

    The ZFSonLinux author is holding back on saying it’s production ready, but I’ve only heard good things about it. Certainly more production ready that something like BTRFS.

    It’s got years of Solaris enterprise customer testing – so the only thing that would let it down is the Linux-specific pieces of the implementation. The actual filesystem is very solid.

  3. antubis says:

    i like the zfsonlinux project, but i see no possibility to set the zfs acls on a linux system, not to mention exporting the zfs acls via nfs(3/4) or smb/cifs…

  4. Andy says:

    @antubis: You’re right. ACLs are a big omission currently. I think the author has plans to support it, but obviously it’ll be some time away.

    IMHO, SMB isn’t a big deal. I just run my own separate config for it, and problem solved.

  5. craig says:

    FYI, if the disks have no partition table you can skip the parted step – ‘zpool create’ will create gpt partitition tables if required.

    and ‘zpool create -f’ will create them even if there is an existing msdos partition table.

    also, when specifiying drives to zpool create (and add, replace, remove, etc) you don’t need to type in the full path…it will search in the /dev/disk/by-*/ directories.

    e.g. you can type this:

    # zpool create -m none -o ashift=12 mypool raidz ata-ST2000DM001-1CH164_Z1E1GYH5 ata-ST2000DM001-9YN164_Z1E2ACRM ata-ST2000DM001-9YN164_Z1F1SHN4

    instead of this:
    # zpool create -m none -o ashift=12 mypool raidz /dev/disk/by-id/ata-ST2000DM001-1CH164_Z1E1GYH5 /dev/disk/by-id/ata-ST2000DM001-9YN164_Z1E2ACRM /dev/disk/by-id/ata-ST2000DM001-9YN164_Z1F1SHN4

  6. Andy Botting says:

    Thanks Craig – very useful comments.

  7. SGI says:

    Please show the publishing date/time of the article, so future generations will be aware that this article was written in the past.

  8. Dave says:

    Craig & Andy,

    I know ZFS can take drives straight from the shelves (no formatting/partitioning required). However, which is the preferred way? To partition, or not?

    I’ve read around where it is best to partition as it ensures the sector count aligns within the set of drives. Ideally, you’d want to add some buffer room in the beginning and at the end of each disk. When a drive needs replacing, you’d partition the new drive, so that it is going into the pool aligned.

    Is this correct?

    Thanks Dave

  9. Andy says:

    Hi Dave,

    I think its preferable to just let ZFS manage the partitioning as well – it I guess it just comes down to your taste.

    ZFS was designed for large corporate installations, so I think the intended use case is that you’d have a large set of the same type of drive, and if you want to expand or replace a drive, you’d get a matching one on warranty.

Leave a Reply


8 × = twenty four