Re: Add LWLock blocker(s) information

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add LWLock blocker(s) information
Date: 2020-06-07 06:12:59
Message-ID: 1881eef5-df4f-bbfa-b03f-0a15d9fab55e@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

On 6/2/20 2:24 PM, Drouvot, Bertrand wrote:
>
> Hi hackers,
>
>
> I've attached a patch to add blocker(s) information for LW Locks.
> The motive behind is to be able to get some blocker(s) information (if
> any) in the context of LW Locks.
>
> _Motivation:_
>
> We have seen some cases with heavy contention on some LW Locks (large
> number of backends waiting on the same LW Lock).
>
> Adding some blocker information would make the investigations easier,
> it could help answering questions like:
>
> * how many PIDs are holding the LWLock (could be more than one in
> case of LW_SHARED)?
> * Is the blocking PID changing?
> * Is the number of blocking PIDs changing?
> * What is the blocking PID doing?
> * Is the blocking PID waiting?
> * In which mode request is the blocked PID?
> * in which mode is the blocker PID holding the lock?
>
> _Technical context and proposal:_
>
> There is 2 points in this patch:
>
> * Add the instrumentation:
>
> * the patch adds into the LWLock struct:
>
>                     last_holding_pid: last pid owner of the lock
>                     last_mode: last holding mode of the last pid owner
> of the lock
>                     nholders: number of holders (could be >1 in case
> of LW_SHARED)
>
> * the patch adds into the PGPROC struct:
>
> //lwLastHoldingPid: last holder of the LW lock the PID is waiting for
>                     lwHolderMode;  LW lock mode of last holder of the
> LW lock the PID is waiting for
>                     lwNbHolders: number of holders of the LW lock the
> PID is waiting for
>
>             and what is necessary to update this new information.
>
> * Provide a way to display the information: the patch also adds a
> function /pg_lwlock_blocking_pid/ to display this new information.
>
> _Outcome Example:_
>
> # select * from pg_lwlock_blocking_pid(10259);
> requested_mode | last_holder_pid | last_holder_mode | nb_holders
> ----------------+-----------------+------------------+------------
> LW_EXCLUSIVE   |           10232 | LW_EXCLUSIVE     |          1
> (1 row)
>
> # select query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid) from pg_stat_activity where state='active' and pid != pg_backend_pid();
>              query              |  pid  | state  |  wait_event   | wait_event_type |          pg_lwlock_blocking_pid           | pg_blocking_pids
> --------------------------------+-------+--------+---------------+-----------------+-------------------------------------------+------------------
> insert into bdtlwa values (1); | 10232 | active |               |                 | (,,,)                                     | {}
> insert into bdtlwb values (1); | 10254 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
> create table bdtwt (a int);    | 10256 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
> insert into bdtlwa values (2); | 10259 | active | BufferContent | LWLock          | (LW_EXCLUSIVE,10232,LW_EXCLUSIVE,1)       | {}
> drop table bdtlwd;             | 10261 | active | WALInsert     | LWLock          | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
> (5 rows)
>
>
> So, should a PID being blocked on a LWLock we could see:
>
> * in which mode request it is waiting
> * the last pid holding the lock
> * the mode of the last PID holding the lock
> * the number of PID(s) holding the lock
>
> _Remarks:_
>
> I did a few benchmarks so far and did not observe notable performance
> degradation (can share more details if needed).
>
> I did some quick attempts to get an exhaustive list of blockers (in
> case of LW_SHARED holders), but I think that would be challenging as:
>
> * There is about 40 000 calls to LWLockInitialize and all my
> attempts to init a list here produced “ FATAL: out of shared
> memory” or similar.
> * One way to get rid of using a list in LWLock could be to use
> proc_list (with proclist_head in LWLock and proclist_node in
> PGPROC). This is the current implementation for the “waiters”
> list. But this would not work for the blockers as one PGPROC can
> hold multiples LW locks so it could mean having a list of about
> 40K proclist_node per PGPROC.
> * I also have concerns about possible performance impact by using
> such a huge list in this context.
>
> Those are the reasons why this patch does not provide an exhaustive
> list of blockers.
>
> While this patch does not provide an exhaustive list of blockers (in
> case of LW_SHARED holders), the information it delivers could already
> be useful to get insights during LWLock contention scenario.
>
> I will add this patch to the next commitfest. I look forward to your
> feedback about the idea and/or implementation.
>
> Regards,
>
> Bertrand

Attaching a new version of the patch with a tiny change to make it pass
the regression tests (opr_sanity was failing due to the new function
that is part of the patch).

Regards,

Bertrand

Attachment Content-Type Size
v1-0002-Add-LWLock-blockers-info.patch text/plain 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-06-07 06:51:22 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message Tom Lane 2020-06-07 04:23:35 valgrind versus pg_atomic_init()