High Availability PostgreSQL

Teams on Render can enable High Availability (HA) for PostgreSQL databases on a Pro instance type or higher.

When you enable HA, Render maintains a separate standby instance of your database that asynchronously replicates the state of your primary instance:

Render
Your other
services
Primary
🟢
Standby
🟢

The standby runs in the same region as the primary. To maximize availability in the event of a major disruption, the two instances are always geographically separate within their region (on the order of tens of kilometers).

If a critical issue causes your primary instance to become unavailable for 30 seconds, Render detects this and automatically fails over to the standby to keep you up and running:

Render
Your other
services
Primary
Standby
🟢

This process takes a few seconds, after which the standby instance becomes the new primary (now hosted at the same URL as the original primary). When the degraded instance becomes healthy again, it becomes the new standby.

Your standby instance always uses the same instance type as your primary instance and is billed accordingly.

Prerequisites

For your database to support high availability, it must:

  • Belong to a team account
  • Use the Pro instance type or higher
  • Use PostgreSQL version 13 or later

Setup

Enabling HA requires a database restart! This makes your database temporarily unavailable (usually for less than five minutes). Schedule your activation of this feature accordingly.

  1. In the Render Dashboard, select your database and open its Info page.

  2. Scroll down to the High Availability section and toggle the switch:

    Enabling HA PostgreSQL

  3. A confirmation dialog appears. Review the details and then click Enable HA.

That’s it! Your database will restart with HA enabled.

Failover

Failover refers to the process of swapping out your primary database instance for your standby instance.

Render performs failover automatically when your primary instance becomes unavailable, and you can perform a manual failover for testing purposes. In all cases, failover takes just a few seconds, after which your other services can reconnect to your database.

Automatic failover

Render automatically triggers a failover to your database’s standby instance whenever your primary instance becomes unavailable for 30 seconds.

Your primary instance might become unavailable because:

  • The node running the instance becomes unresponsive or goes down.
  • A network disruption prevents communication with the instance.
  • The PostgreSQL process itself crashes.

Automatic failover might fail to preserve a small number of the most recent writes to your degraded primary instance. For details, see Limitations of HA.

Manual failover

Manual failover is intended for testing and compliance purposes. Automatic failover handles scenarios where your primary instance becomes unavailable.

You can manually trigger a failover to your database’s standby instance from the Render Dashboard. You might want to do this to test out reconnection behavior for your apps, or to demonstrate failover capabilities for compliance purposes.

Go to your database’s Info page and click Trigger Failover under the High Availability section:

Triggering a manual failover

Performing a manual failover with a healthy primary instance almost never causes any loss of data. It’s possible (but unlikely) that changes to your primary instance in the last few seconds before the failover will be lost.

Reconnecting after a failover

Whenever a failover occurs (automatic or manual), all active connections to your primary instance are terminated. Clients need to reconnect to the new primary instance, which becomes reachable at the exact same database URL. To enable reconnection, make sure your clients include retry functionality in their connection logic.

Limitations of HA

  • Render runs your primary and standby instances on geographically separated nodes in the same region. In the unlikely event that both nodes are affected by an incident, your database will experience downtime.
  • When an automatic failover occurs, a small number of the most recent writes to your degraded primary instance might not be represented in your standby instance. These changes are lost.
    • This is because data is replicated asynchronously, and the primary might not have pushed the most recent writes to the standby before the degradation occurred.
    • In almost all cases, no more than a few seconds of changes are lost.
  • A manual failover almost never results in lost changes, but it’s possible that changes to your primary instance in the last few seconds before the failover will be lost.
  • Failover isn’t possible if your standby instance isn’t available. This might occur for one of the following reasons:
    • The standby is affected by the same severe incident as the primary.
    • The standby is affected by an unrelated, simultaneous incident.
    • Render is performing routine maintenance on the standby.
    • An incident occurs shortly after a previous failover occurred, and the degraded instance has not yet become healthy.
    • An incident occurs shortly after you initialize your primary database (before the standby is also initialized).
  • You can’t connect to a HA database’s standby instance or use it for query scaling purposes. For this use case, instead create a read replica.