From: | Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept |
Date: | 2018-08-09 20:25:59 |
Message-ID: | 4ae82b354af9a20624e1e1d05932087e@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, thank you for your review.
Alexander Korotkov писал 2018-06-20 20:54:
> Thinking about that more I found that adding vacuum mark as an extra
> argument to LockAcquireExtended is also wrong. It would be still hard
> to determine if we should log the lock in LogStandbySnapshot(). We'll
> have to add that flag to shared memory table of locks. And that looks
> like a kludge.
>
> Therefore, I'd like to propose another approach: introduce new lock
> level. So, AccessExclusiveLock will be split into two
> AccessExclusiveLocalLock and AccessExclusiveLock. In spite of
> AccessExclusiveLock, AccessExclusiveLocalLock will be not logged to
> standby, and used for heap truncation.
>
> I expect some resistance to my proposal, because fixing this "small
> bug" doesn't deserve new lock level. But current behavior of
> hot_standby_feedback is buggy. From user prospective,
> hot_standby_feedback option is just non-worker, which causes master to
> bloat without protection for standby queries from cancel. We need to
> fix that, but I don't see other proper way to do that except adding
> new lock level...
Your offer is very interesting, it made patch smaller and more
beautiful.
So I update patch and made changes accordance with the proposed concept
of
special AccessExclusiveLocalLock.
> I don't yet understand what is the problem here and why this patch
> should solve that. As I get idea of this patch, GetOldestXmin() will
> initialize xmin of walsender with lowest xmin of replication slot.
> But xmin of replication slots will be anyway taken into account by
> GetSnapshotData() called by vacuum.
>
>> How to test:
>> Make async replica, turn on feedback, reduce
>> max_standby_streaming_delay
>> and aggressive autovacuum.
>
> You forgot to mention, that one should setup the replication using
> replication slot. Naturally, if replication slot isn't exist, then
> master shouldn't keep dead tuples for disconnected standby. Because
> master doesn't know if standby will reconnect or is it gone forever.
I tried to solve hot_standby_feedback without replication slots,
so I found a little window of possibility of case, when vacuum on
master make heap truncation of pages that standby needs for just started
transaction.
How to test:
Make async replica, turn on feedback and reduce
max_standby_streaming_delay.
Make autovacuum more aggressive.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0
Test1:
Here we will do a load on the master and simulation of a long
transaction with repeated 1 second SEQSCANS on the replica (by calling
pg_sleep 1 second duration every 6 seconds).
MASTER REPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT pg_sleep(value) FROM test;
\watch 6
---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
On Patched version lazy_vacuum_truncation passed without fatal errors.
Only some times Error occurs because this tests is too synthetic
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
Because of rising ResolveRecoveryConflictWithSnapshot while
redo some visibility flags to avoid this conflict we can do test2 or
increase max_standby_streaming_delay.
Test2:
Here we will do a load on the master and simulation of a long
transaction on the replica (by taking LOCK on table)
MASTER REPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1)
id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
LOCK TABLE test IN ACCESS SHARE MODE;
select * from test;
\watch 6
---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.
On Patched version lazy_vacuum_truncation passed without fatal errors.
I would like to here you opinion over this implementation.
--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
hsfeedback_locallock.patch | text/x-diff | 20.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Larry Rosenman | 2018-08-09 20:26:17 | Re: peripatus build failures.... |
Previous Message | Bruce Momjian | 2018-08-09 20:25:09 | Re: POC for a function trust mechanism |