Independent, self-managed infrastructure Read the production requirements

Reliability runbook

Upgrade, backup and disaster recovery for BigBlueButton

A snapshot is not a recovery plan. BigBlueButton spans replaceable software, persistent recordings, application databases, secrets, DNS and external services—each with a different restore method.

01 Routing and secrets DNS, certificates, API and OIDC credentials
02 Application state Greenlight/Scalelite PostgreSQL and configuration
03 Recording state Raw, processed, published and shared storage
04 Replaceable nodes Clean OS and reproducible BBB configuration
Back up by recovery dependency, then test restoration in order.

Executive brief

What matters

  1. 01

    Use persistent override files and automation so a clean node can be rebuilt without reconstructing history.

  2. 02

    Define recovery objectives separately for live classes, frontend data and historical recordings.

  3. 03

    A backup is accepted only after a representative restore has been completed and timed.

01

Classify what must survive

Document host and application configuration, API and identity secrets, Greenlight or Scalelite databases, recording files and metadata, certificates, DNS, monitoring and automation. Identify which items are reproducible and which are unique. Encrypt backups and separate their credentials from production administrators.

02

Upgrade through a controlled path

Read the documentation for source and target versions. BigBlueButton major upgrades may require a clean server and recording migration. Take application-aware backups, drain new meetings, preserve configuration through supported override mechanisms, test externally and maintain a rollback condition.

03

Design recovery by objective

Live-class recovery may mean routing new meetings to healthy pool nodes; it does not revive a meeting lost with its backend. Frontend recovery requires database and configuration restore. Recording recovery may tolerate a longer objective but involve much larger data. State RTO and RPO for each rather than one vague “24-hour backup” promise.

04

Exercise the runbook

Restore Greenlight to an isolated environment, rebuild a BBB node, retrieve selected recordings and rotate a compromised credential. Verify joins from an external network and historical playback. Record duration, missing dependencies and manual decisions, then update contacts and procedures after every exercise.

Evidence base

Sources and further reading

We prefer project documentation and first-party product guidance. Community links are included where they reveal recurring operational questions rather than establish product guarantees.

  1. BigBlueButton installation and upgrade guidance (opens in a new tab)
  2. BigBlueButton customisation and recording transfer (opens in a new tab)
  3. BigBlueButton monitoring (opens in a new tab)
  4. Scalelite architecture (opens in a new tab)

Practical answers

Questions teams ask

Are VM snapshots enough for BigBlueButton?

No. They may assist short-term rollback but do not replace application-aware database backups, recording protection, off-site copies and tested clean rebuilds.

Can a Scalelite pool prevent every outage?

It can direct new meetings away from unhealthy nodes, but a live meeting on a failed backend is interrupted.

What should be restored first?

Follow business objectives and dependencies: routing/identity, frontend state, clean media capacity and then historical recording service as your runbook defines.