Re: Some problems of recovery conflict wait events

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Some problems of recovery conflict wait events
Date: 2020-03-27 07:35:57
Message-ID: 1dbc3ff8-2993-9282-819b-83a966391345@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/03/27 15:39, Masahiko Sawada wrote:
> On Fri, 27 Mar 2020 at 10:32, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>
>>
>>
>> On 2020/03/26 14:33, Masahiko Sawada wrote:
>>> On Tue, 24 Mar 2020 at 17:04, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>
>>>>
>>>>
>>>> On 2020/03/05 20:16, Fujii Masao wrote:
>>>>>
>>>>>
>>>>> On 2020/03/05 16:58, Masahiko Sawada wrote:
>>>>>> On Wed, 4 Mar 2020 at 15:21, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2020/03/04 14:31, Masahiko Sawada wrote:
>>>>>>>> On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2020/03/04 13:27, Michael Paquier wrote:
>>>>>>>>>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
>>>>>>>>>>> Yeah, so 0001 patch sets existing wait events to recovery conflict
>>>>>>>>>>> resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
>>>>>>>>>>> to the recovery conflict on a snapshot. 0003 patch improves these wait
>>>>>>>>>>> events by adding the new type of wait event such as
>>>>>>>>>>> WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
>>>>>>>>>>> is the fix for existing versions and 0003 patch is an improvement for
>>>>>>>>>>> only PG13. Did you mean even 0001 patch doesn't fit for back-patching?
>>>>>>>>>
>>>>>>>>> Yes, it looks like a improvement rather than bug fix.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Okay, understand.
>>>>>>>>
>>>>>>>>>> I got my eyes on this patch set. The full patch set is in my opinion
>>>>>>>>>> just a set of improvements, and not bug fixes, so I would refrain from
>>>>>>>>>> back-backpatching.
>>>>>>>>>
>>>>>>>>> I think that the issue (i.e., "waiting" is reported twice needlessly
>>>>>>>>> in PS display) that 0002 patch tries to fix is a bug. So it should be
>>>>>>>>> fixed even in the back branches.
>>>>>>>>
>>>>>>>> So we need only two patches: one fixes process title issue and another
>>>>>>>> improve wait event. I've attached updated patches.
>>>>>>>
>>>>>>> Thanks for updating the patches! I started reading 0001 patch.
>>>>>>
>>>>>> Thank you for reviewing this patch.
>>>>>>
>>>>>>>
>>>>>>> - /*
>>>>>>> - * Report via ps if we have been waiting for more than 500 msec
>>>>>>> - * (should that be configurable?)
>>>>>>> - */
>>>>>>> - if (update_process_title && new_status == NULL &&
>>>>>>> - TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(),
>>>>>>> - 500))
>>>>>>>
>>>>>>> The patch changes ResolveRecoveryConflictWithSnapshot() and
>>>>>>> ResolveRecoveryConflictWithTablespace() so that they always add
>>>>>>> "waiting" into the PS display, whether wait is really necessary or not.
>>>>>>> But isn't it better to display "waiting" in PS basically when wait is
>>>>>>> necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
>>>>>>> does as the above?
>>>>>>
>>>>>> You're right. Will fix it.
>>>>>>
>>>>>>>
>>>>>>> ResolveRecoveryConflictWithDatabase(Oid dbid)
>>>>>>> {
>>>>>>> + char *new_status = NULL;
>>>>>>> +
>>>>>>> + /* Report via ps we are waiting */
>>>>>>> + new_status = set_process_title_waiting();
>>>>>>>
>>>>>>> In ResolveRecoveryConflictWithDatabase(), there seems no need to
>>>>>>> display "waiting" in PS because no wait occurs when recovery conflict
>>>>>>> with database happens.
>>>>>>
>>>>>> Isn't the startup process waiting for other backend to terminate?
>>>>>
>>>>> Yeah, you're right. I agree that "waiting" should be reported in this case.
>>>>
>>>> On second thought, in recovery conflict case, "waiting" should be reported
>>>> while waiting for the specified delay (e.g., by max_standby_streaming_delay)
>>>> until the conflict is resolved. So IMO reporting "waiting" in the case of
>>>> recovery conflict with buffer pin, snapshot, lock and tablespace seems valid,
>>>> because they are user-visible "expected" wait time.
>>>>
>>>> However, in the case of recovery conflict with database, the recovery
>>>> basically doesn't wait at all and just terminates the conflicting sessions
>>>> immediately. Then the recovery waits for all those sessions to be terminated,
>>>> but that wait time is basically small and should not be the user-visible.
>>>> If that wait time becomes very long because of unresponsive backend, ISTM
>>>> that LOG or WARNING should be logged instead of reporting something in
>>>> PS display. I'm not sure if that logging is really necessary now, though.
>>>> Therefore, I'm thinking that "waiting" doesn't need to be reported in the case
>>>> of recovery conflict with database. Thought?
>>>
>>> Fair enough. The longer wait time of conflicts with database is not
>>> user-expected behaviour so logging would be better. I'd like to just
>>> drop the change around ResolveRecoveryConflictWithDatabase() from the
>>> patch.
>>
>> Make sense.
>>
>> + if (update_process_title)
>> + waitStart = GetCurrentTimestamp();
>>
>> Since LockBufferForCleanup() can be called very frequently,
>> I don't think that it's good thing to call GetCurrentTimestamp()
>> every time when LockBufferForCleanup() is called.
>>
>> + /* Report via ps if we have been waiting for more than 500 msec */
>> + if (update_process_title && new_status == NULL &&
>> + TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(),
>> + 500))
>>
>> Do we really want to see "waiting" in PS even in non hot standby mode?
>>
>> If max_standby_streaming_delay and deadlock_timeout are set to large value,
>> ResolveRecoveryConflictWithBufferPin() can wait for a long time, e.g.,
>> more than 500ms. In that case, I'm afraid that "report if we've been
>> waiting for more than 500ms" logic doesn't work.
>>
>> So I'm now thinking that it's better to report "waiting" immdiately before
>> ResolveRecoveryConflictWithBufferPin(). Of course, we can still use
>> "report if we've been waiting for more than 500ms" logic by teaching 500ms
>> to ResolveRecoveryConflictWithBufferPin() as the minimum wait time.
>> But this looks overkill. Thought?
>>
>> Based on the above comments, I updated the patch. Attached. Right now
>> the patch looks very simple. Could you review this patch?
>
> Thank you for the patch. I agree with you for all the points. Your
> patch looks good to me.

Thanks for the review! Barring any objections, I will commit the latest patch.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-03-27 07:37:15 Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Previous Message Kyotaro Horiguchi 2020-03-27 07:31:15 Re: shared-memory based stats collector