Script Backup Bacula bpipe Hadoop HDFS

O plugin universal bpipe permite que Bacula receba qualquer fluxo de dados da saída padrão para seu armazenamento de backup, incluindo arquivos do cluster Hadoop HDFS com o máximo desempenho.

# This script provides hdfs file copies do Bacula bpipe plugin (FIFO) using multiple hdfs cat commands when backing up and multiple put commands to restore.
# Next backups will only copy changed files from hdfs after last backup recorded time (/etc/last_backup).
# Remark: hdfs /tmp and .tmp. folders are excluded by the grep -v.
# By Heitor Faria ( |;
# Marco Reis; 
# Julio Neves ( and
# Rodrigo Hagstrom
# Tested with Hadoop 2.7.1; August, 2017.
# It must be called at the FileSet INCLUDE Sub-resource, used by the job that 
# backups a Hadoop node with a Bacula Client, like this (e.g.):
# Plugin = "\|/etc/"


if [[ ! -p /etc/last_backup ]]; then
echo "00-00-00;00:00" > /etc/last_backup

Date=$(cat /etc/last_backup | cut -f 1 -d ";")
Hour=$(cat /etc/last_backup | cut -f 2 -d ";")

for filename in $($hdfs dfs -ls -R / | awk -v date="$Date" '$6>=date && $2!="-" {print $7 " " $8}' | awk -v hour="$Hour" '$1>=hour {print $2}' |grep -v -e /tmp/ -e .tmp.)
echo "bpipe:/var$filename:$hdfs dfs -cat $filename:$hdfs dfs -put -f /dev/stdin $filename"

date '+%Y-%m-%d;%H:%M' > /etc/last_backup


