BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while

From: postgresql(dot)org(at)gclough(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #13770: Extending recovery_min_apply_delay on Standby causes it to be unavailable for a while
Date: 2015-11-11 15:50:06
Message-ID: 20151111155006.2644.84564@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 13770
Logged by: Greg Clough
Email address: postgresql(dot)org(at)gclough(dot)com
PostgreSQL version: 9.4.5
Operating system: Oracle Linux v6.7
Description:

If you extend the recovery_min_apply_delay on a standby and then restart it,
you cannot connect to it for read-only transactions until the most recent
transaction time >= delay time. This is problematic if you extend from say
1 hour to 24 hours, as you have to wait almost a day before you can connect
to your Standby again.

I can't find anything in the documentation to describe this as intended
behaviour, and my expectation was that when I extended the delay that the
Standby database would come up and be available for read-only transactions,
but would simply delay any new transactions until they were older than
recovery_min_apply_delay.

I'm curious if this was a design decision to stop people getting confused
when they access a database before recovery_min_apply_delay has expired and
get data that's too recent, or if it was an unintentional consequence of the
way it was implemented.

Environment:

Oracle Linux v6.7
PostgreSQL v9.4.5 (Installed via yum)

postgres(at)gclough[5432] ~$ uname -a
Linux gclough 3.8.13-98.4.1.el6uek.x86_64 #2 SMP Wed Sep 23 18:46:01 PDT
2015 x86_64 x86_64 x86_64 GNU/Linux

postgres(at)gclough[5432] ~$ psql --version
psql (PostgreSQL) 9.4.5

Test Case:

1. Create a Primary database on port 5432

export PGDATA=/u01/postgres/9.4/5432/data
export PGPORT=5432
initdb

wal_level = hot_standby # minimal, archive, hot_standby, or
logical
max_wal_senders = 1 # max number of walsender processes

2. Replicate it to a Standby database on port 5433

export PGDATA=/u01/postgres/9.4/5433/data
export PGPORT=5433

port = 5433 # (change requires restart)
wal_level = hot_standby # minimal, archive, hot_standby, or
logical
max_wal_senders = 1 # max number of walsender processes
hot_standby = on # "on" allows queries during
recovery

3. On the Standby (Port 5433), introduce a 2min time delay with
"recovery_min_apply_delay" in the recovery.conf:

standby_mode = 'on'
primary_conninfo = 'port=5432 host=localhost user=postgres
application_name=port_5433'
recovery_min_apply_delay = 2min

4. Check the current lag on the Standby (Port 5433):

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:32:32 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7000000 | 0/7000000 | | t
(1 row)

5. Create a small test table on the Primary (Port 5432):

CREATE TABLE delay_test (id NUMERIC);
INSERT INTO delay_test VALUES (1);

6. Check the application of WAL on the Standby (Port 5433), and note the
delay:

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:33:06 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015DC8 | 0/7015890 | | t
(1 row)

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:34:00 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015E00 | 0/7015890 | | t
(1 row)

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:01 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015E00 | 0/7015E00 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

7. Reset the time delay on the Standby from 2 to 5 minutes

recovery_min_apply_delay = 5min

pg_ctl restart -m fast

8. Try to connect to the Standby, and get a failure:

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:35:26 GMT 2015
psql: FATAL: the database system is starting up

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:36:01 GMT 2015
psql: FATAL: the database system is starting up

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:37:00 GMT 2015
psql: FATAL: the database system is starting up

9. Wait 3 minutes... then it works:

postgres(at)gclough[5433] /u01/postgres/9.4/5433/data$ date; psql -c 'select
pg_last_xlog_receive_location() "receive_xlog",
pg_last_xlog_replay_location() "replay_xlog",
pg_last_xact_replay_timestamp(), pg_is_in_recovery() "recovering"'
Wed Nov 11 15:38:00 GMT 2015
receive_xlog | replay_xlog | pg_last_xact_replay_timestamp | recovering
--------------+-------------+-------------------------------+------------
0/7015ED8 | 0/7015ED8 | 2015-11-11 15:32:53.950376+00 | t
(1 row)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2015-11-11 16:18:35 Re: BUG #13768: JSONB concat
Previous Message jhirsch 2015-11-11 15:48:02 Re: BUG #13622: unsafe use of relative rpath libpq.5.dylib