Re: WIP: long transactions on hot standby feedback replica / proof of concept

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: long transactions on hot standby feedback replica / proof of concept
Date: 2017-09-04 21:33:51
Message-ID: CAPpHfdsMgm2Wu2SvEd=MUinS8EkN5XTLyOrOpCw-se26=wVMrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 4, 2017 at 2:04 PM, <i(dot)kartyshov(at)postgrespro(dot)ru> wrote:

> Our clients complain about this issue and therefore I want to raise the
> discussion and suggest several solutions to this problem:
>
> I. Why does PG use Fatal when Error is enough to release lock that rises
> lock conflict?
> "If (RecoveryConflictPending && DoingCommandRead)"
>
> II. Do we really need to truncate the table on hot standby exactly at the
> same time when truncate on master occurs?
>
> In my case conflict happens when the autovacuum truncates table tbl1 on
> master while backend on replica is performing a long transaction involving
> the same table tbl1. This happens because truncate takes an
> AccessExclusiveLock. To tackle this issue we have several options:
>
> 1. We can postpone the truncate on the master until all the replicas have
> finished their transactions (in this case, feedback requests to the master
> should be sent frequently)
> Patch 1
> vacuum_lazy_truncate.patch
>

I've following comments on this patch:
1) You shouldn't use ">=" to compare xids. You should use
TransactionIdFollowsOrEquals()
function which handles transaction id wraparound correctly.
2) As I understood, this patch makes heap truncate only when no concurrent
transactions are running on both master and replicas with
hot_standby_feedback enabled. For busy system, it would be literally
"never do heap truncate after vacuum".

2. Maybe there is an option somehow not to send AccessExclusiveLock and not
> to truncate table on the replica right away. We could try to wait a little
> and truncate tbl1 on replica again.
>
> Here is a patch that makes replica skip truncate WAL record if some
> transaction using the same table is already running on replica. And it also
> forces master to send truncate_wal to replica with actual table length
> every time autovacuum starts.
> Patch 2
> standby_truncate_skip_v1.patch
>
> In this case the transaction which is running for several hours won’t be
> interrupted because of truncate. Even if for some reason we haven’t
> truncated this table tbl1 right away, nothing terrible will happen. The
> next truncate wal record will reduce the file size by the actual length (if
> some inserts or updates have been performed on master).
>

Since you wrote this patch under on my request, let me clarify its idea
little more.

Currently, lazy_truncate_heap() is very careful on getting
AccessExclusiveLock to truncate heap. It doesn't want either block other
transaction or wait for this lock too long. If lock isn't acquired after
some number of tries lazy_truncate_heap() gives up. However, once
lazy_truncate_heap() acquires AccessExclusiveLock is acquired on master, it
would be replayed on replicas where it will conflict with read-only
transactions if any. That leads to unexpected behaviour when
hot_standby_feedback is on.

Idea of fixing that is to apply same logic of getting AccessExclusiveLock
on standby as on master: give up if somebody is holding conflicting lock
for long enough. That allows standby to have more free pages at the end of
heap than master have. That shouldn't lead to errors since those pages are
empty, but allows standby to waste some extra space. In order to mitigate
this deficiency, we're generating XLOG_SMGR_TRUNCATE records more frequent:
on finish of every vacuum. Therefore, if even standby gets some extra
space of empty pages, it would be corrected during further vacuum cycles.

@@ -397,7 +425,6 @@ lazy_vacuum_rel(Relation onerel, int options,
> VacuumParams *params,
> appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate:
> %.3f MB/s\n"),
> read_rate, write_rate);
> appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
> -
> ereport(LOG,
> (errmsg_internal("%s", buf.data)));
> pfree(buf.data);
> @@ -1820,7 +1847,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats
> *vacrelstats)
> vacrelstats->pages_removed += old_rel_pages - new_rel_pages;
> vacrelstats->rel_pages = new_rel_pages;
>
> - ereport(elevel,
> + ereport(WARNING,
> (errmsg("\"%s\": truncated %u to %u pages",
> RelationGetRelationName(onerel),
> old_rel_pages, new_rel_pages),

Ivan, what these changes are supposed to do?

diff --git a/src/backend/storage/lmgr/lock.c
> b/src/backend/storage/lmgr/lock.c
> index 2b26173..f04888e 100644
> --- a/src/backend/storage/lmgr/lock.c
> +++ b/src/backend/storage/lmgr/lock.c
> @@ -816,7 +816,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
> XLogStandbyInfoActive())
> {
> LogAccessExclusiveLockPrepare();
> - log_lock = true;
> +// log_lock = true;
> }
>
> /*

Perhaps, it's certainly bad idea to disable replaying AccessExclusiveLock
*every time*, we should skip taking AccessExclusiveLock only while writing
XLOG_SMGR_TRUNCATE record. Besides that, this code violates out coding
style.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-09-04 22:21:16 pg_basebackup behavior on non-existent slot
Previous Message Tom Lane 2017-09-04 21:27:18 Re: pgbench tap tests & minor fixes.