Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

From: Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
Date: 2018-02-28 14:24:50
Message-ID: 292210e99e1ca70742abfbf9e55d7805@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Thank you for your valuable comments. I've made a few adjustments.

The main goal of my changes is to let long read-only transactions run on
replica if hot_standby_feedback is turned on.

Patch1 - hsfeedback_av_truncate.patch is made to stop
ResolveRecoveryConflictWithLock occurs on replica, after autovacuum lazy
truncates heap on master cutting some pages at the end. When
hot_standby_feedback is on, we know that the autovacuum does not remove
anything superfluous, which could be needed on standby, so there is no
need to rise any ResolveRecoveryConflict*.

1) Add to xl_standby_locks and xl_smgr_truncate isautovacuum flag, which
tells us that autovacuum generates them.

2) When autovacuum decides to trim the table (using lazy_truncate_heap),
it takes AccessExclusiveLock and sends this lock to the replica, but
replica should ignore AccessExclusiveLock if hot_standby_feedback=on.

3) When autovacuum truncate wal message is replayed on a replica, it
takes ExclusiveLock on a table, so as not to interfere with read-only
requests.

We have two cases of resolving ResolveRecoveryConflictWithLock if timers
(max_standby_streaming_delay and max_standby_archive_delay) have run
out:
backend is idle in transaction (waiting input) - in this case backend
will be sent SIGTERM
backend transaction is running query - in this case running transaction
will be aborted

How to test:
Make async replica, turn on feedback and reduce
max_standby_streaming_delay.
Make autovacuum more aggressive.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0

Test1:
Here we will do a load on the master and simulation of a long
transaction with repeated 1 second SEQSCANS on the replica (by calling
pg_sleep 1 second duration every 6 seconds).
MASTER REPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT pg_sleep(value) FROM test;
\watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Only some times Error occurs because this tests is too synthetic
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
Because of rising ResolveRecoveryConflictWithSnapshot while
redo some visibility flags to avoid this conflict we can do test2 or
increase max_standby_streaming_delay.

Test2:
Here we will do a load on the master and simulation of a long
transaction on the replica (by taking LOCK on table)
MASTER REPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1)
id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
LOCK TABLE test IN ACCESS SHARE MODE;
select * from test;
\watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Test3:
Here we do a load on the master and simulation of a long transaction
with repeated 1 second SEQSCANS on the replica (by calling pg_sleep 1
second duration every 6 seconds).
MASTER REPLICA
hot_standby = on
max_standby_streaming_delay = 4s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 200 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT pg_sleep(value) FROM test;

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

This way we can make transactions with SEQSCAN, INDEXSCAN or BITMAPSCAN

Patch2 - hsfeedback_noninvalide_xmin.patch
When walsender is initialized, its xmin in PROCARRAY is set to
GetOldestXmin() in order to prevent autovacuum running on master from
truncating relation and removing some pages that are required by
replica. This might happen if master's autovacuum and replica's query
started simultaneously. And the replica has not yet reported its xmin
value.

How to test:
Make async replica, turn on feedback, reduce max_standby_streaming_delay
and aggressive autovacuum.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0

Test:
Here we will start replica and begi repeatable read transaction on
table, then we stop replicas postmaster to prevent starting walreceiver
worker (on master startup) and sending master it`s transaction xmin over
hot_standby_feedback message.
MASTER REPLICA
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM
generate_series(1,10000000) id);
stop
start
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM test;
stop postmaster with gdb
start
DELETE FROM test WHERE id > 0;
wait till autovacuum delete and changed xmin
release postmaster with gdb
--- Result on replica
FATAL: terminating connection due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.

There is one feature of the behavior of standby, which let us to allow
the autovacuum to cut off the page table (at the end of relation) that
no one else needs (because there is only dead and removed tuples). So if
the standby SEQSCAN or another *SCAN mdread a page that is damaged or
has been deleted, it will receive a zero page, and not break the request
for ERROR.

Could you give me your ideas over these patches.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
hsfeedback_av_truncate.patch text/x-diff 4.9 KB
hsfeedback_noninvalide_xmin.patch text/x-diff 767 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Shaplov 2018-02-28 14:46:06 Re: [PATCH] Opclass parameters
Previous Message Grigory Smolkin 2018-02-28 13:50:17 Re: Reopen logfile on SIGHUP