Re: adding wait_start column to pg_locks

From: Ian Lawrence Barwick <barwick(at)gmail(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: adding wait_start column to pg_locks
Date: 2021-01-14 03:39:45
Message-ID: CAB8KJ=hN=sz2+u3cQ+5jhZT9+TP-N7OCBLTp68L4fkVoVTqWjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2021年1月4日(月) 15:04 torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>:
>
> On 2021-01-02 06:49, Justin Pryzby wrote:
> > On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:
> >> So I'm now thinking about adding a new column in pg_locks which
> >> keeps the time at which locks started waiting.
> >>
> >> Attached a patch.
> >
> > This is failing make check-world, would you send an updated patch ?
> >
> > I added you as an author so it shows up here.
> > http://cfbot.cputube.org/atsushi-torikoshi.html
>
> Thanks!
>
> Attached an updated patch.

I took a look at this patch as it seems useful (and I have an item on my
bucket
list labelled "look at the locking code", which I am not at all familiar
with).

I tested the patch by doing the following:

Session 1:

postgres=# CREATE TABLE foo (id int);
CREATE TABLE

postgres=# BEGIN ;
BEGIN

postgres=*# INSERT INTO foo VALUES (1);
INSERT 0 1

Session 2:

postgres=# BEGIN ;
BEGIN

postgres=*# LOCK TABLE foo;

Session 3:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass AND NOT granted\x\g\x

-[ RECORD 1 ]-----------------------------
locktype | relation
relation | 16452
pid | 3513935
mode | AccessExclusiveLock
granted | f
fastpath | f
wait_start | 2021-01-14 12:03:06.683053+09

So far so good, but checking *all* the locks on this relation:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass\x\g\x

-[ RECORD 1 ]-----------------------------
locktype | relation
relation | 16452
pid | 3513824
mode | RowExclusiveLock
granted | t
fastpath | f
wait_start | 2021-01-14 12:03:06.683053+09
-[ RECORD 2 ]-----------------------------
locktype | relation
relation | 16452
pid | 3513935
mode | AccessExclusiveLock
granted | f
fastpath | f
wait_start | 2021-01-14 12:03:06.683053+09

shows the RowExclusiveLock granted in session 1 as apparently waiting from
exactly the same time as session 2's attempt to acquire the lock, which is
clearly
not right.

Also, if a further session attempts to acquire a lock, we get:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass\x\g\x

-[ RECORD 1 ]-----------------------------
locktype | relation
relation | 16452
pid | 3513824
mode | RowExclusiveLock
granted | t
fastpath | f
wait_start | 2021-01-14 12:05:53.747309+09
-[ RECORD 2 ]-----------------------------
locktype | relation
relation | 16452
pid | 3514240
mode | AccessExclusiveLock
granted | f
fastpath | f
wait_start | 2021-01-14 12:05:53.747309+09
-[ RECORD 3 ]-----------------------------
locktype | relation
relation | 16452
pid | 3513935
mode | AccessExclusiveLock
granted | f
fastpath | f
wait_start | 2021-01-14 12:05:53.747309+09

i.e. all entries now have "wait_start" set to the start time of the latest
session's
lock acquisition attempt.

Looking at the code, this happens as the wait start time is being recorded
in
the lock record itself, so always contains the value reported by the latest
lock
acquisition attempt.

It looks like the logical place to store the value is in the PROCLOCK
structure; the attached patch reworks your patch to do that, and given the
above
scenario produces following output:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass\x\g\x

-[ RECORD 1 ]-----------------------------
locktype | relation
relation | 16452
pid | 3516391
mode | RowExclusiveLock
granted | t
fastpath | f
wait_start |
-[ RECORD 2 ]-----------------------------
locktype | relation
relation | 16452
pid | 3516470
mode | AccessExclusiveLock
granted | f
fastpath | f
wait_start | 2021-01-14 12:19:16.217163+09
-[ RECORD 3 ]-----------------------------
locktype | relation
relation | 16452
pid | 3516968
mode | AccessExclusiveLock
granted | f
fastpath | f
wait_start | 2021-01-14 12:18:08.195429+09

As mentioned, I'm not at all familiar with the locking code so it's quite
possible that it's incomplete,there may be non-obvious side-effects, or it's
the wrong approach altogether etc., so further review is necessary.

Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

Attachment Content-Type Size
v3-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patch text/x-patch 8.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-01-14 04:05:28 Re: Moving other hex functions to /common
Previous Message Fujii Masao 2021-01-14 03:34:01 Re: A failure of standby to follow timeline switch