Re: WIP: long transactions on hot standby feedback replica / proof of concept

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: long transactions on hot standby feedback replica / proof of concept
Date: 2017-09-05 09:52:41
Message-ID: CAA4eK1JmeJ4c2hngu+Va__t9G10JPWWdDkTkLqZuCz=-aUD1-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 5, 2017 at 3:03 AM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Mon, Sep 4, 2017 at 2:04 PM, <i(dot)kartyshov(at)postgrespro(dot)ru> wrote:
>>
>> Our clients complain about this issue and therefore I want to raise the
>> discussion and suggest several solutions to this problem:
>>
>> I. Why does PG use Fatal when Error is enough to release lock that rises
>> lock conflict?
>> "If (RecoveryConflictPending && DoingCommandRead)"
>>
>> II. Do we really need to truncate the table on hot standby exactly at the
>> same time when truncate on master occurs?
>>
>> In my case conflict happens when the autovacuum truncates table tbl1 on
>> master while backend on replica is performing a long transaction involving
>> the same table tbl1. This happens because truncate takes an
>> AccessExclusiveLock. To tackle this issue we have several options:
>>
>> 1. We can postpone the truncate on the master until all the replicas have
>> finished their transactions (in this case, feedback requests to the master
>> should be sent frequently)
>> Patch 1
>> vacuum_lazy_truncate.patch
>
>
> I've following comments on this patch:
> 1) You shouldn't use ">=" to compare xids. You should use
> TransactionIdFollowsOrEquals() function which handles transaction id
> wraparound correctly.
> 2) As I understood, this patch makes heap truncate only when no concurrent
> transactions are running on both master and replicas with
> hot_standby_feedback enabled. For busy system, it would be literally "never
> do heap truncate after vacuum".
>
>> 2. Maybe there is an option somehow not to send AccessExclusiveLock and
>> not to truncate table on the replica right away. We could try to wait a
>> little and truncate tbl1 on replica again.
>>
>> Here is a patch that makes replica skip truncate WAL record if some
>> transaction using the same table is already running on replica. And it also
>> forces master to send truncate_wal to replica with actual table length every
>> time autovacuum starts.
>> Patch 2
>> standby_truncate_skip_v1.patch
>>
>> In this case the transaction which is running for several hours won’t be
>> interrupted because of truncate. Even if for some reason we haven’t
>> truncated this table tbl1 right away, nothing terrible will happen. The next
>> truncate wal record will reduce the file size by the actual length (if some
>> inserts or updates have been performed on master).
>
>
> Since you wrote this patch under on my request, let me clarify its idea
> little more.
>
> Currently, lazy_truncate_heap() is very careful on getting
> AccessExclusiveLock to truncate heap. It doesn't want either block other
> transaction or wait for this lock too long. If lock isn't acquired after
> some number of tries lazy_truncate_heap() gives up. However, once
> lazy_truncate_heap() acquires AccessExclusiveLock is acquired on master, it
> would be replayed on replicas where it will conflict with read-only
> transactions if any. That leads to unexpected behaviour when
> hot_standby_feedback is on.
>
> Idea of fixing that is to apply same logic of getting AccessExclusiveLock on
> standby as on master: give up if somebody is holding conflicting lock for
> long enough. That allows standby to have more free pages at the end of heap
> than master have. That shouldn't lead to errors since those pages are
> empty, but allows standby to waste some extra space. In order to mitigate
> this deficiency, we're generating XLOG_SMGR_TRUNCATE records more frequent:
> on finish of every vacuum. Therefore, if even standby gets some extra space
> of empty pages, it would be corrected during further vacuum cycles.
>

I think one deficiency of this solution is that it will keep on
generating extra WAL even if standby doesn't need it (as standby has
successfully truncated the relation). I don't know if we can just get
away by saying that an additional WAL record per vacuum cycle is
harmless especially when that doesn't serve any purpose (like for the
cases when standby is always able to truncate the heap on first WAL
record). Am I missing some part of solution which avoids it?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildar Musin 2017-09-05 09:55:32 Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Previous Message Nikolay Shaplov 2017-09-05 09:41:34 [PATCH] Tests for reloptions