BackupBot – Deep Dive into ProactiveAI
ProactiveAI is the next generation of Retrospect’s Proactive scheduling engine. With ProactiveAI, backup scripts will optimize the backup window for the entire environment based on a decision tree algorithm and linear regression to ensure every source is protected as often as possible. BackupBot is Retrospect’s first step–with ProactiveAI, Storage Predictions, and 1-Click Backup–toward leveraging that information for better data protection.
Artificial intelligence might seem exotic and complicated, but if you break it down, many machine learning algorithm that fall under that category are straight-forward to understand. ProactiveAI is no different. Under the hood, ProactiveAI’s algorithm is a list of choices (a decision tree) that takes into account past events (linear regression).
ProactiveAI walks through the following decision tree algorithm to prioritize what to back up next:
Verify backup window: ProactiveAI only runs when it’s allowed to. To restrict the backup window, go to the script’s schedule.
Verify an execution unit is available: ProactiveAI only runs when an execution unit is available.
Ignore last backup time: Retrospect can back up every hour, every day, every Sunday, or any other schedule. As soon as ProactiveAI sees a new backup window (i.e. a new day), it will attempt to back up the sources. In contrast, previous versions of Retrospect would respect the time at which the last backup occurred. See "Backup Window" for more details.
Ignore unavailable sources: If a source is unavailable, Retrospect will not attempt to reach it again until every potentially available source has been contacted. This list includes Wake-on-LAN sources. See "Wake-on-LAN" for more details.
Prioritize by next day: For all available or potentially available sources, Retrospect divides them into buckets for what day they are scheduled to be backed up next.
Using a future date might seem strange, but it can be in the past as well. This sorting algorithm ensure Retrospect prioritizes initial backups and then overdue backups. Think of it as last backup day combined with the script’s schedule. As an example, Script A with weekly backups and Script B with daily backups would calculate the next backup date differently.
Prioritize by last time checked: When Retrospect reaches out to a source, it marks that time in its configuration. ProactiveAI uses this time to ensure it doesn’t re-check sources that it already checked but couldn’t find, so that the script can get through the entire list of sources before circling back.
Prioritize by the last backup’s duration: Now that Retrospect is down to sources within the same day of priority, ProactiveAI sorts them using a linear regression algorithm based on the last backup’s duration. Sources with faster previous backups will be backed up sooner than sources with slower previous backups.
As a real-life example, incremental backups of email services are fast, so those would be prioritized over a longer server backup. Because of this sorting, Retrospect will protect more sources throughout the day, but if a long server backup does not happen on a given day, its backup will be automatically given higher priority because its next backup was the day before.
Our Engineering team experimented with more data points in the linear regression, but the resulting sort order was too prone to hysteresis. In other words, if Retrospect includes more past data, including backup durations that were anomalies, the future prioritization continued to be affected for longer than we thought was useful.
Default to prior order: If there is no duration, ProactiveAI uses the prior order. For instance, if it’s the first set of backups, they will occur as sources are available.
Connect to the next source: Retrospect will attempt to back up the selected source. If it’s not available, Retrospect marks that time and moves on. If Retrospect times out and the client and script have Wake-on-LAN (WAL) set, Retrospect sends a WAL packet, waits three minutes, then tries to connect again. If that connection times out, Retrospect marks the sources as unavailable and moves on.
Record next backup date: After a successful backup, Retrospect marks the next backup date for the source and moves on. As discussed earlier, this future date varies based on the script’s schedule.
Retrospect begins a backup as soon as a source becomes available. If Alice’s laptop was backed up at 2:30pm yesterday, ProactiveAI will attempt to back up her laptop as soon as it comes online today, even if that’s before 2:30pm.
This change corrects a long-standing issue with drift, and for existing customers, this new schedule represents a significant change from previous versions. In the past, Proactive used the "Last Backup Time" to determine when to next back up a source. If Alice’s laptop was backed up at 2:30pm yesterday, an older version of Proactive would wait until 2:30pm today to attempt the next backup, regardless of whether it was idle and Alice’s laptop was available.
Alice might have only opened her laptop at 2:30pm yesterday, but ever other day, she is online at 9am. Without this change, every future backup would have been at 2:30pm or later until she missed a day. Instead, her laptop is protected as soon as it’s available for each backup window. For fine-grain scheduling, customers can use multiple ProactiveAI scripts with different schedules.
ProactiveAI is better optimized for handling Wake-on-LAN (WAL) sources. If the source has WAL enabled or the script has WAL enabled, ProactiveAI will include WAL packets in its operation. For each WAL source, Retrospect attempts a connection. If that times out after one minute, it sends a WAL packet, waits three minutes, and then attempts another connection. If that times out after one minute, ProactiveAI marks the source as unavailable, moves on, and will not attempt another connection until it has contacted each subsequent source.
In previous versions, Proactive would continue to attempt to wake up unresponsive or absent machines. For environments that had many laptops or otherwise unavailable machines, this workflow meant that Retrospect would spend a disproportionate amount of time looking for machines instead of backing up available machines.
ProactiveAI includes detailed logging to to understand the choices it’s making to optimize the backup window:
Engine Log Level 4: What ProactiveAI is doing
Engine Log Level 5: What ProactiveAI is considering
See Advanced Logging Options for details about enabling logging.
Last Update: 14 Mar 2018