Managing Storage v0.0.15

This guide explains how to manage storage on your Klio server, prevent disk full scenarios, and recover when storage is exhausted.

The Klio server uses persistent storage for backup data, WAL archives, cache, and the work queue. When the data PVC approaches capacity, backup and WAL archival operations may fail.

How Disk Space Is Freed

Deleting a backup does not immediately free disk space. Klio is built on top of Kopia, which uses a two-phase approach:

  1. Deletion logically deletes backups
  2. Maintenance actually removes deleted data, if the data is older than 24 hours and unused by any existing backup

This design prevents accidental data loss from concurrent operations. However, it means that space is not freed until maintenance runs and the 24-hour safety window has passed.

Maintenance runs automatically in the background. Klio does not currently provide a way to trigger it on demand, but it can be started manually by opening a shell on the Klio server pod and running:

kopia maintenance set --owner=me \
  --config-file=/tmp/$KOPIACONFIG_TIER1_CONF \
  --disable-file-logging
kopia maintenance run \
  --config-file=/tmp/$KOPIACONFIG_TIER1_CONF \
  --full \
  --disable-file-logging

When the Disk Is Full

When the Klio data PVC is completely full:

  • All backup operations block (new backups, deletions, maintenance)
  • WAL streaming to Klio stops
  • PostgreSQL accumulates WAL files on its own PVC
  • No backup corruption occurs, even when backups fail mid-operation
  • No orphan or incomplete snapshots are left behind
  • Existing backups remain intact and restorable

All operations resume automatically when space is freed.

Warning

While the Klio disk is full, WAL files build up on the PostgreSQL PVC. If this condition persists, it can lead to disk pressure on the database side as well. Resolve Klio storage issues promptly to avoid cascading failures.

Resolving Storage Issues

Expand the PVC

The simplest option is to expand the PVC. The Klio operator supports expansion of PersistentVolumeClaims (PVCs) for all storage components: data, cache (Tier 1 and Tier 2), and queue.

Prerequisites

PVC expansion requires a StorageClass with allowVolumeExpansion: true.

User Responsibility

Before attempting to resize PVCs, you must verify that your StorageClass supports volume expansion. The operator will attempt the resize operation directly—if the StorageClass does not support expansion, the Kubernetes API will reject the request and the operator will log an error.

To check if your StorageClass supports expansion:

kubectl get storageclass <your-storage-class> -o jsonpath='{.allowVolumeExpansion}'

If the output is not true, you need to either:

  1. Update the StorageClass to enable volume expansion (if the underlying storage provisioner supports it)
  2. Use a different StorageClass that supports volume expansion
  3. Migrate to a new PVC (see Limitations for options)

Expanding PVC Size

To expand a PVC, update the corresponding pvcTemplate.resources.requests.storage field in the Server spec with a larger value:

apiVersion: klio.enterprisedb.io/v1alpha1
kind: Server
metadata:
  name: my-server
spec:
  tier1:
    data:
      pvcTemplate:
        resources:
          requests:
            storage: 200Gi  # Increased from 100Gi
    cache:
      pvcTemplate:
        resources:
          requests:
            storage: 20Gi   # Increased from 10Gi
  queue:
    pvcTemplate:
      resources:
        requests:
          storage: 20Gi     # Increased from 10Gi

Apply the updated Server resource:

kubectl apply -f klio-server.yaml

What Happens During Resize

When you update the Server spec with larger PVC sizes, the following occurs:

  1. PVC expansion: The operator patches PVCs directly to the new size. This modifies the PVC resources but does not update the StatefulSet—the StatefulSet's VolumeClaimTemplates remain unchanged at this point.
  2. Temporary misalignment: After the PVC patch, there is a brief period where the PVCs have the new size but the StatefulSet VolumeClaimTemplates still reflect the old size.
  3. StatefulSet recreation: The operator detects that the expected StatefulSet (with new VolumeClaimTemplates) differs from the current one. Since VolumeClaimTemplates are immutable in Kubernetes, the StatefulSet is deleted and recreated to align with the new spec.
  4. Pod restart: The Klio server pod restarts and mounts the already-expanded PVCs.
Why explicit PVC patching is necessary

VolumeClaimTemplates only define specs for new PVCs—they do not resize existing ones. Without explicit PVC patching by the operator, the StatefulSet would be recreated but the PVCs would remain at their original size, creating a permanent mismatch between the Server spec and actual storage.

StatefulSet and PVC Alignment

After the full resize operation completes:

  • The PVCs have the new expanded size
  • The StatefulSet VolumeClaimTemplates match the new size (after recreation)
  • The Server spec is consistent with both

This ensures no drift between the desired state and actual resources.

Monitoring Resize Progress

The operator emits a PVCExpanded Kubernetes event on the Server resource when a PVC is successfully expanded. You can view these events with:

kubectl describe server my-server

Check the PVC status to monitor the resize operation:

kubectl get pvc -l klio.enterprisedb.io/klio-server=my-server

The PVC will show the new requested size in spec.resources.requests.storage. The actual capacity is reflected in status.capacity.storage once the resize completes.

For detailed status, including any resize conditions:

kubectl describe pvc data-my-server-klio-0

Limitations

  • Expansion only: PVC shrinking is not supported by Kubernetes. Attempting to reduce the storage size will be ignored and logged as a warning.
  • StorageClass support: The StorageClass must have allowVolumeExpansion: true. If the StorageClass does not support expansion, the resize will fail and an error will be logged.
  • Pod restart required: Due to StatefulSet VolumeClaimTemplates being immutable, PVC expansion causes a brief pod restart.
  • Filesystem resize: After the volume is expanded, the filesystem must also be resized. Most modern storage providers handle this automatically.
No Automatic Fallback

If your StorageClass does not support volume expansion, there is no automatic fallback. The operator will not delete and recreate PVCs to achieve a larger size, as this would result in permanent data loss. The only options in this case are:

  1. Migrate to a StorageClass that supports volume expansion
  2. Create a new Klio server with larger PVCs and restore from backup
  3. Manually migrate data (requires downtime and careful planning)

Delete Backups and Run Maintenance

Warning

Maintenance requires some free disk space to run. If the disk is completely full, maintenance itself may fail. In that case, expand the PVC first or contact EDB support.

If PVC expansion is not available, free space by deleting old backups and running maintenance manually.

  1. Delete old backups:

    kubectl exec -it my-server-klio-0 -- \
      klio admin delete-backup <oldest-backup-name> \
        --cluster my-cluster --tier1

    To delete from both tiers, add --tier2.

    Note

    The actual space freed depends on how much data is shared with other backups through deduplication. Deleting a backup only reclaims space for data blocks that are not referenced by any remaining backup.

    Warning

    Deleting a backup removes the ability to restore to that point in time. Ensure you have adequate backups remaining before deletion.

  2. Run maintenance to reclaim space from deleted backups by opening a shell on the Klio server pod and running:

    kopia maintenance set --owner=me \
      --config-file=/tmp/$KOPIACONFIG_TIER1_CONF \
      --disable-file-logging
    kopia maintenance run \
      --config-file=/tmp/$KOPIACONFIG_TIER1_CONF \
      --full \
      --disable-file-logging

Contact EDB Support

If the above options are not viable or maintenance fails due to insufficient disk space, contact EDB support for assistance.

Best Practices

  1. Configure retention policies: The most effective way to control storage growth is through properly configured retention policies, which automatically delete old backups and WAL files no longer needed for recovery. See Retention Policies for configuration details.

  2. Monitor storage usage: Klio does not provide built-in storage alerts. Set up monitoring and alerting on your PVC usage to detect capacity issues before they cause failures.

  3. Size Tier 1 storage appropriately: Account for your backup frequency, database size, change rate, and retention requirements when provisioning the data PVC. Include buffer for the 24-hour window during which deleted backup data is not yet eligible for garbage collection.

  4. Use Tier 2 for long-term retention: Object storage (S3, etc.) is more cost-effective and scales easily for long-term backup retention. Keep Tier 1 lean for fast recovery of recent backups.

  5. Use expandable StorageClasses: When possible, use StorageClasses with allowVolumeExpansion: true to enable online PVC expansion as a recovery option.