Skip to content

Cannot switch over a cluster through API or patronictl even there is no replication lag #1152

@corama

Description

@corama

We encountered an issue while testing on switching over a cluster on test purpose: patroni node refused to perform and giving warning notes as bellow:

2025-08-15 07:03:07,502 INFO: Member patroni-5344-node-1 exceeds maximum replication lag
2025-08-15 07:03:07,502 WARNING: switchover: no healthy members found, switchover is not possible
2025-08-15 07:03:07,502 INFO: Cleaning up failover key

Actually there is no write load on the target cluster at all, cause we are just in testing phrase.
Here are the config for patroni:

$ cat postgres.yml
bootstrap:
  dcs:
    loop_wait: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      archive_mode: true
      hot_standby: 'on'
      log_destination: csvlog
      log_filename: postgresql-%Y-%m-%d_%H%M%S.log
      logging_collector: 'on'
      max_locks_per_transaction: 512
      parameters:
        archive_mode: 'on'
        archive_timeout: 1800s
        autovacuum_analyze_scale_factor: 0.02
        autovacuum_max_workers: 5
        autovacuum_vacuum_scale_factor: 0.05
        checkpoint_completion_target: 0.9
        hot_standby: 'on'
        log_autovacuum_min_duration: 0
        log_checkpoints: 'on'
        log_connections: 'on'
        log_disconnections: 'on'
        log_filename: postgresql-%Y-%m-%d_%H%M%S.log
        log_lock_waits: 'on'
        log_min_duration_statement: 500
        log_statement: ddl
        log_temp_files: 0
        max_connections: 136
        max_replication_slots: 10
        max_wal_senders: 10
        tcp_keepalives_idle: 900
        tcp_keepalives_interval: 100
        track_functions: all
        wal_compression: 'on'
        wal_level: hot_standby
        wal_log_hints: 'on'
      restart_after_crash: true
      unix_socket_directories: .
      use_pg_rewind: true
      use_slots: true
      wal_level: replica
    retry_timeout: 15
    ttl: 200
  initdb:
  - encoding: UTF8
  - locale: en_US.UTF-8
  - data-checksums
......

And cluster status is:

Image

We try to increase setting for maximum_lag_on_failover, but not work.

Any suggestions?
Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions