Scaling Ceph RGW

The Power of Keepalived for High Availability

July 3, 2026 by

Tinihub Inc.

Scaling Ceph RGW: The Power of Keepalived for High Availability

When deploying Ceph Object Gateway (RGW), the gateway itself is essentially a stateless proxy. While you can scale horizontally by adding more RGW instances, you face a critical challenge: How do you present a single, reliable endpoint to your clients?

If your clients are hardcoded to a single RGW IP address and that instance goes down, your storage becomes inaccessible. This is where Keepalived becomes an essential component of your infrastructure.

The Architecture: Keepalived + RGW

Keepalived implements the Virtual Router Redundancy Protocol (VRRP). It allows you to configure a Virtual IP (VIP) that floats between your physical RGW nodes.

If the primary node hosting the VIP fails, Keepalived automatically detects the outage and shifts the VIP to a standby node within milliseconds.

Benefits: With vs. Without Keepalived

Feature	Without Keepalived	With Keepalived
Availability	Single point of failure.	High availability via failover.
Client Configuration	Hardcoded to specific nodes.	Points to a single, stable VIP.
Maintenance	Requires client-side changes.	Transparent; move VIP to perform updates.
Complexity	Low initially, high during outage.	Moderate setup, high operational resilience.

Implementation Example

In this setup, we assume two nodes (node-1 and node-2) both running RGW. We want a shared VIP: 192.168.1.100.

Keepalived Configuration (keepalived.conf)

Place this file in /etc/keepalived/keepalived.conf on both nodes.

Note: Ensure you adjust the priority (higher for master) and the interface name.

Why this is the "Gold Standard" for Ceph RGW

Seamless Failover: Because the VIP persists, clients (S3 browsers, SDKs, backup tools) do not need to be reconfigured or updated when a node needs a reboot or suffers a kernel panic.
Health Awareness: By using the vrrp_script shown above, Keepalived doesn't just check if the server is alive; it checks if the RGW service is actually running. If RGW crashes but the OS stays up, the VIP will still migrate, ensuring traffic stays directed toward a working gateway.
Cost-Effective: Unlike hardware load balancers (F5/Citrix) which can be incredibly expensive, Keepalived is open-source, lightweight, and runs directly on your existing RGW Linux nodes.

Final Considerations

Load Balancing: Keepalived handles high availability, but it does not perform load balancing across all RGWs. If you have 10+ RGW nodes, consider putting HAProxy or Nginx in front of your RGWs, and have Keepalived manage the VIP for those load balancers instead.
Networking: Ensure that your network environment allows gratuitous ARP, which is how Keepalived communicates the VIP movement to the rest of the network.

Are you looking to integrate this with a specific load balancer like HAProxy, or are you aiming for a simple two-node active-passive setup?

in Techiest

# KeepAlived Object Storage Scaling S3 Storage Scaling Scaling Ceph RGW

Tinihub Inc. July 3, 2026

Share this post

Security

KeepAlived Object Storage Scaling S3 Storage Scaling Scaling Ceph RGW