Block-level File-system Deduplication with Aligned Volumes Tutorial (Bacula 9.0.8 and above)

Preliminar remarks:

  • This feature is available now for both Bacula Community (9.0.8 or greater) and Enterprise.
  • Bacula software compression should not be enabled with Aligned format, resulting in poor dedup performance.
  • You will need a small SSD area to store the dedup index engine.
  • In this method or Bacula will create distinct volumes to contain the metadata of the files copied from the backup and another one to the data itself.

Data deduplication is a dictionary based data reduction approach, due to its ability to effectively reduce backup storage or archiving datasets size by a factor of 4-40X. It  is becoming an essential backup system component because it reduces storage space requirements and also lso a critical one, since the performance of all the backup operation depends on storage throughput.

According to Figure 1, the new Aligned Format proves to be a good storage cost reducing new Bacula Community feature, and to be much more efficient than ZBackup (alternate tar dedup software) in terms of backup and restore speeds. There is a minor impact in backup and restore duration, but it is an acceptable trade-off.


Block-level File-system Deduplication with Aligned Volumes Tutorial (Bacula 9.0.8 and above) 1

Figure1 – Old Community version without Aligned Volumes versus New Aligned format (AUTORSHIP OF THIS PICTURE IS FROM HEITOR FARIA).


More than ever, disk backups are becoming a feasible replacement for tape libraries, since deduplication is not a feature that can currently be efficiently deployed on the sequential magnetic tapes. Only disks have this advantage.

1. ZFS FileSystem

Currently, there are several deduplication file systems nowadays, such as lessfs, opendedup, ZFS and others.Hardware with deduplication capabilities can also be used with Bacula new Aligned Format. Here, we are deploying ZFS, and then Ddumbfs as an alternative.

a) RedHat/CentOS Install (https://github.com/zfsonlinux/zfs/wiki/RHEL-and-CentOS):

yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm

echo "
[zfs-kmod]
name=ZFS on Linux for EL7 - kmod
baseurl=http://download.zfsonlinux.org/epel/7.5/kmod/$basearch/
enabled=1
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux" > /etc/yum.repos.d/zfs.repo

yum install zfs
modprobe zfs

b) Debian/Ubuntu Install:

sudo -i
apt-get -y install zfsutils-linux

Initializing the ZFS

The ZFS initialization will require one or more physical disks. In the example bellow, /zfs/mnt should be the configured bacula-sd.conf path on ArchiveDevice directives. Compression might also be enabled.

sudo zpool create -f zfs /dev/sdb 
zfs create zfs/mnt
zpool status zfs
df -h
zfs set dedup=on zfs/mnt
zfs set compression=on zfs/mnt
chown bacula /zfs/mnt

Reference:


2. The Dedup FileSystem (ALTERNATIVE)

Ddumbfs was chosen for this laboratory for being both open source and focused on faster operations thanks to its very simple index design, which is very important for shorter backup windows.

2.1 Installing ddumbfs Dependencies

To compile ddumbfs you need as usual: make and gcc, the headers for fuse and mhash library and pkg-config.

Here are the corresponding package for RedHat and Debian based distributions (some of them need to be built from source):

  • RedHat/CentOS: fuse fuse-libs mhash fuse-devel mhash-devel pkgconfig gcc make
  • Debian/Ubuntu: libfuse2 libmhash2 libfuse-dev libmhash-dev pkg-config fuse-utils build-essential

a) RedHat/CentOS Packages:

sudo -i
yum -y install epel-release.noarch
yum -y install fuse fuse-libs mhash fuse-devel mhash-devel pkgconfig gcc make automake

b) Debian/Ubuntu Packages:

sudo -i

apt-get -y install fuse libfuse2 libmhash2 libfuse-dev libmhash-dev pkg-config build-essential autotools-dev

2.2 Building Ddumbfs from source

wget -qO- http://www.magiksys.net/download/ddumbfs/ddumbfs-1.1.tar.gz | tar -xzvf - -C /usr/src
cd /usr/src/ddumbfs-*
./configure
make
make install

2.3 Initializing Ddumbfs

Create two directories. First one should be a SSD mounting point to host the ddumbfs index engine. Second one should be a mounting point where your Bacula Storage Volumes will be written, typically a large disk array.

mkdir /mnt/ddumbfs.data

mkdir /mnt/ddumbfs.mnt

Initialize the deduplication engine. In this example a 999G volume is created, so change it to the desired size that fits your disk:

mkddumbfs -B 128k -s 999G /mnt/ddumbfs.data
ddumbfs $TARGET -o parent=/mnt/ddumbfs.mnt

Add a new line like this to /etc/fstab, to make ddumbfs persistent after boot:

-oparent=/mnt/ddumbfs.data   /mnt/ddumbfs.mnt   fuse.ddumbfs   defaults  0  0

Restart machine to make sure ddumbfs is always mounted at boot time.


3. Bacula Aligned Volumes Configuration

You need to install the Algined Drivers package, available through bacula.org’s personal package repository (Bacula Binary Package Download, requires registration).

yum install bacula-aligned.x86_64

Restart the Storage Daemon to apply the changes.

This is an example of bacula-sd.conf new device. Device Type must be aligned; Maximum Concurrent Jobs should always be 1; block size values can vary according to the used deduplication FileSystem:

Device {
  Name = Aligned-Disk
  Device Type = Aligned  # Must be aligned
  Media Type = File1
  Archive Device = /zfs/mnt    # Or /mnt/ddumbfs.mnt if ddumfs mounting point.
  LabelMedia = yes; 
  Random Access = Yes;
  AutomaticMount = yes; 
  RemovableMedia = no;
  AlwaysOpen = no;
  Maximum Concurrent Jobs = 1    # Always 1 for Aligned
  Minimum Block Size=0K
  Maximum Block Size=128K
  File Alignment=128K
  Padding Size=512
  Minimum Aligned Size=4096
}

Detailed information:

For the filesystems ZFS, lessfs, and ddumbfs, the following values produce excellent results:
Block Size=128K
File Alignment=128K
Padding Size=512
Minimum Aligned Size=4096

For NetApp filesystems, the following are preferable:
Block Size=64K
File Alignment=4K
Padding Size=4K
Minimum Aligned Size=4K

Where the values are shown at right after the equal sign, and the K means to multiply by 1024 bytes.
Block Size is the size of blocks to be written into the Aligned Volume.
File Alignment is the alignment of the first block of each original file stored in the Aligned Volume.
Padding Size is the alignment to which the last block of an original file is filled with zeros if it is not full.
Minimum Aligned Size is the file size below which the file will be placed in the Metadata Volume rather than the Aligned Volume.

[Ref.: Sibbald, Kern – https://www.google.com/patents/US20160055169]

At least, just attach the created bacula-sd Device to your Director. Edit your bacula-dir.conf:

Storage {
  Name = Disk-Backup
  Address = hfaria-desk-i5 
  SDPort = 9103
  Password = "5PWzqJzEokv3z9U_NwBd6bJ30ib1x4TMW"
  Device = Aligned-Disk
  Media Type = File1
}

Run a few full backup jobs. After the first full job, next ones should barely increase deduplicated storage size. The command will display the occupied data:

df -h

And the list jobs command from bconsole will display the size the backup jobs were supposed to occupy.

Enjoy!


 

Disponível em: pt-brPortuguês (Portuguese (Brazil))enEnglish

Leave a Reply