Enterprise Bacula Global Deduplication Driver Quick Guide

This Quick Guide provides the steps required to implement deduplication with Global and Storage Endpoint Deduplication backups and Bacula Enterprise Edition.

IT organizations are constantly being challenged to deliver high-quality solutions with a reduced total cost of ownership. One of those challenges is the growing amount of data to be backed up, together with limited time to run backup jobs (backup window). Bacula Enterprise offers several ways to tackle these challenges, one of them being Global Endpoint Deduplication, which minimizes network transfer and Bacula Volume size using deduplication technology.

Deduplication can significantly reduce the disk space needed to store your data. In the best cases, depending on the backed up data deduplicability, it may reduce the backup disk space needed by 99%.

Bacula Global and Storage Deduplication Plugin already compress data after deduplication, so you should avoid backing up compressed data, which usually renders a poor deduplication ratio.

Deduplication can significantly reduce the network bandwidth required because both ends can exchange references instead of the actual data itself. It works when the destination already has a copy of the original chunks. Deduplication works for backups but also when doing a restore.

Handling references instead of the data can speed up most of the processing inside the Storage Daemon. For example, Bacula features like copy/migrate and Virtual Full can be up to 1,000 times faster.

Recommendations

To do efficient and fast deduplication, the Storage Daemon will need additional CPU power (to compute hash codes and do compression), as well as additional RAM (for fast hash code lookups).

For effective performance, the deduplication Index should be stored on SSDs as the index will have many random accesses and many updates. Normally 10GB is required from 1TB of backups.

For deduplication storage, prefer file systems that checksum everything such as ZFS and by using hardware RAID technology. If you are not able to use ZFS, we advise you to use XFS. The reason for this is, if your disk develops a bad block, instead of damaging one file (that may be stored many times), it may damage all (dozen, hundred) files that contain that same block of data.

Deduplication is not implemented for tape devices. It works only with disk-based backups.

Global Deduplication is performed by Bacula Software. If you want to use hardware or file-system provided deduplication, refer to the bacula Aligned Volumes Driver product whitepaper.

Installation

Dedup Driver installation package is available for RHEL, CentOS, Debian, Ubuntu, Suse and most of other supported Bacula Enterprise Linux distributions. It should be installed in the same Bacula Storage Daemon machine. E.g.:

rpm -ivh bacula-enterprise-dedup-plugin-8.10.1-1.el7.x86_64.rpm

Restart bacula-sd and verify if the driver is loaded with a status storage command.

Storage Daemon Configuration (bacula-sd.conf)

As Figure 1, the Dedup Directory and Dedup Index Directory directives must be configured for use by Global Dedup. From the bconsole, go to the Configuration Module, Storage Daemon button (middle of the screen) and edit the Storage Daemon Resource:

Enterprise Bacula Global Deduplication Driver Quick Guide 1

Figure 1. Configuration of the bacula-sd.conf Daemon Storage Feature by BWeb

From the text, your bacula-sd.conf must be edited to include the directives:

Storage {
  Name = my-sd
  Working Directory = /opt/bacula/working
  Pid Directory = /opt/bacula/working
  Subsys Directory = /opt/bacula/working
  Plugin Directory = /opt/bacula/plugins
  Dedup Directory = /mnt/bacula/dedup/containers
  Dedup Index Directory = /mnt/SSD/dedup/index
}

Dedup Directory = <directory-path>. Containers backup data will be stored in the Dedup Directory. This directory is common for all Dedup devices configured on a Storage Daemon and should have a large amount of free space to host backups deduplicated data. We advise you to use LVM on Linux Systems to ensure that you can extend the space in this directory. If you do change the Dedup Directory directive, its files must be moved to the new directory.

Dedup Index Directory = <directory-path>. Indexes will be stored in the Dedup Index Directory. Indexes will have a lot of random update accesses and will benefit from fast drives such as SSD drives.

By default, Bacula Storage Daemon runs with the bacula operating system user. Make sure it have permission when mounting these directories, or if they are local disks:

chown bacula /mnt/bacula/dedup/containers
chown bacula /mnt/SSD/dedup/index

Stil on bacula-sd.conf, It is also necessary to create a new special Autochanger and Devices that will use the deduplication driver:

Autochanger {
  Name = "Dedup"
  ChangerCommand = "/dev/null"
  ChangerDevice = "/dev/null"
  Device = "DedupDisk1","DedupDisk2"
}

Device {
  Name = "DedupDisk1"
  Archive Device = /mnt/bacula/dedup/volumes
  Media Type = DedupVolume1
  Device Type = Dedup
  LabelMedia = yes
  Random Access = Yes
  AutomaticMount = yes
  RemovableMedia = no
  AlwaysOpen = no
  Maximum Concurrent Job = 5
}

Device {
  Name = "DedupDisk2"
  Archive Device = /mnt/bacula/dedup/volumes
  Media Type = DedupVolume1
  Device Type = Dedup
  LabelMedia = yes
  Random Access = Yes
  AutomaticMount = yes
  RemovableMedia = no
  AlwaysOpen = no
  Maximum Concurrent Job = 5
}

An Autochanger and multiple Devices are suggested in order to avoid concurrent backup jobs bottlenecks and saturate the Storage Daemon writing capacity. More than 5 Director concurrent jobs will go to the next available Device (DedupDisk2).

Archive Device contains the folder to the Bacula traditional volumes format, so it is still possible to use bscan, bextract and another Bacula emergency tools. Using Dedup, the Volumes doesn’t really contain the backup data, but the containers wrote in the Dedup Directory.

Media Type should be a unique name for all Devices and Storage Daemons attached to the same Director. Different Autochangers should have different Media Type.

Restart the Storage Daemon to apply changes.

Director Configuration (bacula-dir.conf)

Create a new Autochanger Directive in order to attach the Bacula SD:

Storage {
  Name = Dedup
  Allow Compression = No
  Address = 192.168.0.85
  Password = xxx
  Device = Dedup
  Media Type = DedupVolume1
 ...
}

Allow Compression should be no to disable Bacula software compression since Global Dedup Engine already performs it after deduplication.

Device and Media Type names should match the ones set in the bacula-sd.conf Autochanger and Device resources configuration.

On each one of the Bacula FileSets, it is possible to decide between Global (backup client) or Storage deduplication. The Global Deduplication (both sides) has the advantage of minimizing the network traffic. The storage option only performs data deduplication in the Storage Daemon machine.

Include {
  Options {
    Dedup = bothsides  # or storage
  }
  File = /etc
}

Optional File Daemons Configuration (bacula-fd.conf)

Bacula File Daemons might be configured to hold some deduplication information in order to speed up restores, especially using lower bandwidths. This can be activated with the Enable Client Rehydration directive (default = no).

FileDaemon {
  ...
  Enable Client Rehydration = yes
}

New Bacula Director Pools

Once a new Storage is attached, it is a good idea to create new associated backup Pools, in order to don’t get Dedup Volumes mixed with any others. E.g.

Pool {
  Name = Daily-Dedup
  Type = Backup
  Storage = Dedup
  ...
}

Backup Job Configuration

There is no special configuration in backup jobs that use Deduplication. Just schedule regular backup jobs to the newly created backup Pools, using the FileSets with the bothside or storage deduplication options set.

Deduplication Test and Status

Here is an example output of the dedup usage command output:

* dedup storage=Dedup usage
Dedupengine status:
  DDE: hash_count=1275 ref_count=1276 ref_size=78.09 MB
    ref_ratio=1.00 size_ratio=1.13 dde_errors=0
  Config: bnum=1179641 bmin=33554393 bmax=335544320 mlock_strategy=1
    mlocked=9MB mlock_max=0MB
  Containers: chunk_allocated=3469 chunk_used=1275
    disk_space_allocated=101.2 MB disk_space_used=68.87 MB
    containers_errors=0
  Vacuum: last_run="06-Nov-14 13:28" duration=1s ref_count=1276
    ref_size=78.09 MB vacuum_errors=0 orphan_addr=16
  Stats: read_chunk=4285 query_hash=7591 new_hash=3469 calc_hash=3470
    [1] filesize=40.88KB/499.6KB usage=36/484/524288   7% ***...............
    [2] filesize=40.13KB/589.0KB usage=18/286/524288   6% **5...............
    [3] filesize=25.47KB/655.2KB usage=7/212/524288    3% *4................

The size_ratio is the overall deduplication gain for all backup Jobs. The more backups are performed and restained, this ratio should improve.

The ref_size it is the size of backup Jobs before deduplication, and the disk_space_used is the dedup containers actual size.

It is a good idea to enable the hole-punching, what might save more disk space in the long term, taking advantage of expired blocks in the dedup engine. E.g.:

* dedup vacuum holepunching storage=<DeviceName>

Read the referenced Bacula Systems whitepaper for more information on the dedup usage output and for more command options, such as vacuum and engine scrub (verification) process.

As shown in Figure 2, it is also possible to obtain similar information from Bweb, such as the consumption of deduplicated bakup data (Disk Disk Used), and the size if these data were not reduced by Dedup (Standard File or Tape Equivalent Storage Space). Dedup Engine Version, Holes Size and many other information. The use of the containers and arrangement of the occupied blocks are displayed graphically in the widget below in this screen.

Enterprise Bacula Global Deduplication Driver Quick Guide 2

Figure 2. Statistics Menu, Usage EBacula Global Deduplication by BWeb

Running a Backup Job

The regular Bacula Job log should print some information about the deduplication. E.g.:

...
  FD Bytes Written:       137,402,873,108 (137.4 GB)
  SD Bytes Written:       258,766,712 (258.7 MB)
  Rate:                   28330.5 KB/s
  Software Compression:   99.8% 531.0:1
...

FD Bytes Written is the non-deduplicated backup data taken from Client.

The SD Bytes Written is the amount of duplicated data written on Bacula Storage.

If the FileSet compression format is set (e.g. LZO, even if not used by Bacula when writing to the Dedup driver), the Software Compression should inform the deduplication ratio.

Reference

Global Endpoint Deduplication – Bacula Enterprise Edition. http://baculasystems.com

 

 

Disponível em: enEnglish

Leave a Reply