Re: WIP: long transactions on hot standby feedback replica / proof of concept

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ivan Kartyshov <i(dot)kartyshov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: long transactions on hot standby feedback replica / proof of concept
Date: 2017-11-06 06:31:06
Message-ID: CAD21AoCWY9LWOrGx57UNOFG7bc-yd4TXcVRSWpQcvJs_kRYZtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 4, 2017 at 7:04 AM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Wed, Nov 1, 2017 at 5:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>>
>> On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
>> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> > On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
>> >> wrote:
>> >> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>> >> > <i(dot)kartyshov(at)postgrespro(dot)ru> wrote:
>> >> >> Hello. I made some bugfixes and rewrite the patch.
>> >> >
>> >> > I don't think it's a good idea to deliberately leave the state of the
>> >> > standby different from the state of the master on the theory that it
>> >> > won't matter. I feel like that's something that's likely to come
>> >> > back
>> >> > to bite us.
>> >>
>> >> I agree with Robert. What happen if we intentionally don't apply the
>> >> truncation WAL and switched over? If we insert a tuple on the new
>> >> master server to a block that has been truncated on the old master,
>> >> the WAL apply on the new standby will fail? I guess there are such
>> >> corner cases causing failures of WAL replay after switch-over.
>> >
>> >
>> > Yes, that looks dangerous. One approach to cope that could be teaching
>> > heap
>> > redo function to handle such these situations. But I expect this
>> > approach
>> > to be criticized for bad design. And I understand fairness of this
>> > criticism.
>> >
>> > However, from user prospective of view, current behavior of
>> > hot_standby_feedback is just broken, because it both increases bloat and
>> > doesn't guarantee that read-only query on standby wouldn't be cancelled
>> > because of vacuum. Therefore, we should be looking for solution: if one
>> > approach isn't good enough, then we should look for another approach.
>> >
>> > I can propose following alternative approach: teach read-only queries on
>> > hot
>> > standby to tolerate concurrent relation truncation. Therefore, when
>> > non-existent heap page is accessed on hot standby, we can know that it
>> > was
>> > deleted by concurrent truncation and should be assumed to be empty. Any
>> > thoughts?
>> >
>>
>> You also meant that the applying WAL for AccessExclusiveLock is always
>> skipped on standby servers to allow scans to access the relation?
>
>
> Definitely not every AccessExclusiveLock WAL records should be skipped, but
> only whose were emitted during heap truncation. There are other cases when
> AccessExclusiveLock WAL records are emitted, for instance, during DDL
> operations. But, I'd like to focus on AccessExclusiveLock WAL records
> caused by VACUUM for now. It's kind of understandable for users that DDL
> might cancel read-only query on standby. So, if you're running long report
> query then you should wait with your DDL. But VACUUM is a different story.
> It runs automatically when you do normal DML queries.
>
> AccessExclusiveLock WAL records by VACUUM could be either not emitted, or
> somehow distinguished and skipped on standby. I haven't thought out that
> level of detail for now.
>

I understood. I'm concerned the fact that we cannot distinguish that
AccessExclusiveLock WAL came from the vacuum truncation or other
operation required AccessExclusiveLock so far. So I agree that we need
to invent a way for that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-11-06 06:51:13 Re: dropping partitioned tables without CASCADE
Previous Message David Rowley 2017-11-06 05:32:40 Re: [Sender Address Forgery]Re: path toward faster partition pruning