Re: Some problems of recovery conflict wait events

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Some problems of recovery conflict wait events
Date: 2020-03-04 02:04:00
Message-ID: d60fd913-7cfc-564e-62b6-3db3995a5e33@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/02/29 12:36, Masahiko Sawada wrote:
> On Wed, 26 Feb 2020 at 16:19, Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>>
>> On Tue, 18 Feb 2020 at 17:58, Masahiko Sawada
>> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>>>
>>> Hi all,
>>>
>>> When recovery conflicts happen on the streaming replication standby,
>>> the wait event of startup process is null when
>>> max_standby_streaming_delay = 0 (to be exact, when the limit time
>>> calculated by max_standby_streaming_delay is behind the last WAL data
>>> receipt time is behind). Moreover the process title of waiting startup
>>> process looks odd in the case of lock conflicts.
>>>
>>> 1. When max_standby_streaming_delay > 0 and the startup process
>>> conflicts with a lock,
>>>
>>> * wait event
>>> backend_type | wait_event_type | wait_event
>>> --------------+-----------------+------------
>>> startup | Lock | relation
>>> (1 row)
>>>
>>> * ps
>>> 42513 ?? Ss 0:00.05 postgres: startup recovering
>>> 000000010000000000000003 waiting
>>>
>>> Looks good.
>>>
>>> 2. When max_standby_streaming_delay > 0 and the startup process
>>> conflicts with a snapshot,
>>>
>>> * wait event
>>> backend_type | wait_event_type | wait_event
>>> --------------+-----------------+------------
>>> startup | |
>>> (1 row)
>>>
>>> * ps
>>> 44299 ?? Ss 0:00.05 postgres: startup recovering
>>> 000000010000000000000003 waiting
>>>
>>> wait_event_type and wait_event are null in spite of waiting for
>>> conflict resolution.
>>>
>>> 3. When max_standby_streaming_delay > 0 and the startup process
>>> conflicts with a lock,
>>>
>>> * wait event
>>> backend_type | wait_event_type | wait_event
>>> --------------+-----------------+------------
>>> startup | |
>>> (1 row)
>>>
>>> * ps
>>> 46510 ?? Ss 0:00.05 postgres: startup recovering
>>> 000000010000000000000003 waiting waiting
>>>
>>> wait_event_type and wait_event are null and the process title is
>>> wrong; "waiting" appears twice.
>>>
>>> The cause of the first problem, wait_event_type and wait_event are not
>>> set, is that WaitExceedsMaxStandbyDelay which is called by
>>> ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
>>> using pg_usleep rather than WaitLatch. I think we can change it so
>>> that it uses WaitLatch and those caller passes wait event information.
>>>
>>> For the second problem, wrong process title, the cause is also
>>> relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
>>> conflicts we add "waiting" to the process title in WaitOnLock but we
>>> add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
>>> have WaitOnLock not set process title in recovery case.
>>>
>>> This problem exists on 12, 11 and 10. I'll submit the patch.
>>>
>>
>> I've attached patches that fix the above two issues.
>>
>> 0001 patch fixes the first problem. Currently there are 5 types of
>> recovery conflict resolution: snapshot, tablespace, lock, database and
>> buffer pin, and we set wait events to only 2 events out of 5: lock
>> (only when doing ProcWaitForSignal) and buffer pin.

+1 to add those new wait events in the master. But adding them sounds like
new feature rather than bug fix. So ISTM that it's not be back-patchable...

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-03-04 02:28:49 Re: range_agg
Previous Message Peter Geoghegan 2020-03-04 01:58:24 Re: [PATCH] kNN for btree