Bacula and BBR Protocol for Internet Backups, Packet Losses, Disconnections, etc.

Backups through degraded networks (e.g., with packet loss) and the Internet, where the connection passes through various NATs, firewalls, and routers, tend to significantly affect the performance and resilience of TCP connections. Errors like the following can occur in Bacula.

2023-04-20 21:03:38 ocspbacprdap02-sd JobId 11052: Fatal error: append.c:175 Error reading data header from FD. n=-2 msglen=20 ERR=I/O Error
2023-04-20 21:03:38 ocspbacprdap02-sd JobId 11052: Error: bsock.c:395 Wrote 23 bytes to client:10.16.152.200:9103, but only 0 accepted.

#or

02-Aug 09:13 backupserver-dir JobId 110334: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer

The use of the BBR congestion control protocol on the Director, Storage, and File Daemons Linux machines of Bacula significantly improves resilience to these errors. Response time and network performance are also enhanced, as disconnections and packet losses have much less impact on transfer rates.

What is BBR?

BBR is an acronym for “Bottleneck Bandwidth and RTT” (Bottleneck Bandwidth and Round-Trip Time). The BBR congestion control calculates the sending rate based on the estimated delivery rate derived from ACKs.BBR was contributed to the Linux kernel version 4.9 in 2016 by Google.

BBR significantly increased throughput and reduced latency for connections in Google’s internal networks, as well as for google.com and YouTube web servers.

BBR requires only changes on the sender’s side, with no need for changes in the network or on the receiver’s side. Therefore, it can be deployed incrementally on the current Internet or in data centers.

How to Enable BBR

The following shell script should implement BBR.

modprobe tcp_bbr
echo "tcp_bbr" > /etc/modules-load.d/bbr.conf
echo "net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq" >> /etc/sysctl.conf
sudo sysctl -p
sysctl net.ipv4.tcp_congestion_control

If the last command displays the BBR protocol on the screen as follows:

root@hfaria-P65:~# sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr

If another protocol is displayed, restart the server.

How to Test Network Performance?

iperf3 is a utility for conducting network throughput tests.

$ sudo apt-get install -y iperf3

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libiperf0 libsctp1
Suggested packages:
  lksctp-tools
The following NEW packages will be installed:
  iperf3 libiperf0 libsctp1
...

iperf3 can use the -C (or –congestion) option to choose the congestion control algorithm. In our tests, we can specify BBR as follows:

-C, --congestion algo
      Set the congestion control algorithm (Linux and FreeBSD only).  An  older  --linux-congestion  synonym
      for this flag is accepted but is deprecated.

iperf -C bbr -c example.com  # replace example.com with your test target

Note:
BBR TCP is only on the sender’s side, so you don’t need to worry if the receiver supports BBR. Note that BBR is much more effective when using FQ (fair queuing) to pace packets to no more than 90% of the line rate.

How Can I Monitor BBR TCP Connections on Linux?

You can use the ss utility (another tool for investigating sockets) to monitor BBR’s state variables, including pacing rate, cwnd, bandwidth estimate, min_rtt estimate, and more.

Example output of ss -tin:

$ ss -tin
State       Recv-Q       Send-Q              Local Address:Port                 Peer Address:Port        Process
ESTAB       0            36                      10.0.0.55:22                 123.23.12.98:61030
     bbr wscale:6,7 rto:292 rtt:91.891/20.196 ato:40 mss:1448 pmtu:9000 rcvmss:1448 advmss:8948 cwnd:48 bytes_sent:95301
   bytes_retrans:136 bytes_acked:95129 bytes_received:20641 segs_out:813 segs_in:1091 data_segs_out:792 data_segs_in:481
   bbr:(bw:1911880bps,mrtt:73.825,pacing_gain:2.88672,cwnd_gain:2.88672) send 6050995bps lastsnd:4 lastrcv:8 lastack:8
   pacing_rate 5463880bps delivery_rate 1911928bps delivered:791 app_limited busy:44124ms unacked:1 retrans:0/2
   dsack_dups:1 rcv_space:56576 rcv_ssthresh:56576 minrtt:73.825

The following fields may appear:

ts     show string "ts" if the timestamp option is set

sack   show string "sack" if the sack option is set

ecn    show string "ecn" if the explicit congestion notification option is set

ecnseen
        show string "ecnseen" if the saw ecn flag is found in received packets

fastopen
        show string "fastopen" if the fastopen option is set

cong_alg
        the congestion algorithm name, the default congestion algorithm is "cubic"

wscale:<snd_wscale>:<rcv_wscale>
        if window scale option is used, this field shows the send scale factor and receive scale factor

rto:<icsk_rto>
        tcp re-transmission timeout value, the unit is millisecond

backoff:<icsk_backoff>
        used for exponential backoff re-transmission,  the  actual  re-transmission  timeout  value  is
        icsk_rto << icsk_backoff

rtt:<rtt>/<rttvar>
        rtt  is  the average round trip time, rttvar is the mean deviation of rtt, their units are mil‐
        lisecond

ato:<ato>
        ack timeout, unit is millisecond, used for delay ack mode

mss:<mss>
        max segment size

cwnd:<cwnd>
        congestion window size

pmtu:<pmtu>
        path MTU value

ssthresh:<ssthresh>
        tcp congestion window slow start threshold

bytes_acked:<bytes_acked>
        bytes acked

bytes_received:<bytes_received>
        bytes received

segs_out:<segs_out>
        segments sent out

segs_in:<segs_in>
        segments received

send <send_bps>bps
        egress bps

lastsnd:<lastsnd>
        how long time since the last packet sent, the unit is millisecond

lastrcv:<lastrcv>
        how long time since the last packet received, the unit is millisecond

lastack:<lastack>
        how long time since the last ack received, the unit is millisecond

pacing_rate <pacing_rate>bps/<max_pacing_rate>bps
        the pacing rate and max pacing rate

rcv_space:<rcv_space>
        a helper variable for TCP internal auto tuning socket receive buffer

Examples of TCP Throughput Improvement

From Google

Google Research and YouTube implemented BBR and achieved improvements in TCP performance.

Here are performance result examples to illustrate the difference between BBR and CUBIC:

  • Resilience to random loss (e.g., due to shallow buffers): Consider a netperf TCP_STREAM test lasting 30 seconds on a path emulated with a 10 Gbps bottleneck, 100 ms RTT, and 1% packet loss rate. CUBIC achieves 3.27 Mbps, while BBR reaches 9150 Mbps (2798 times higher).
  • Low latency with common inflated buffers on last-mile links today: Consider a netperf TCP_STREAM test lasting 120 seconds on a path emulated with a 10 Mbps bottleneck, 40 ms RTT, and a buffer of 1000 packets. Both fully utilize the bottleneck bandwidth, but BBR can do so with an average RTT 25 times lower (43 ms instead of 1.09 seconds).

From AWS CloudFront

During March and April 2019, AWS CloudFront implemented BBR. According to AWS’s blog: ‘BBR TCP Congestion Control with Amazon CloudFront

BBR usage on CloudFront has been globally favorable, with performance gains of up to 22% improvement in aggregate throughput across various networks and regions.

From Shadowsocks

I have a Shadowsocks server running on a Raspberry Pi. Without BBR, the client’s download speed is about 450 KB/s. With BBR, the client’s download speed improves to 3.6 MB/s, which is 8 times faster than the default.

BBR v2

There is ongoing work on BBR v2, which is still in the alpha phase.

Troubleshooting

sysctl: setting key ‘net.core.default_qdisc’: No such file or directory

sysctl: setting key "net.core.default_qdisc": No such file or directory

The reason is that the tcp_bbr kernel module has not been loaded yet. To load tcp_bbr, execute the following command:

sudo modprobe tcp_bbr

To check if tcp_bbr is loaded, use lsmod. For example, in the following command, you should see the tcp_bbr line:

$ lsmod | grep tcp_bbr
tcp_bbr                20480  3

“If the sudo modprobe tcp_bbr command doesn’t work, restart the system.

Reference

Disponível em: pt-brPortuguês (Portuguese (Brazil))enEnglishesEspañol (Spanish)

Leave a Reply