Re: Expose lock group leader pid in pg_stat_activity

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Expose lock group leader pid in pg_stat_activity
Date: 2020-02-04 14:27:25
Message-ID: 20200204142725.GA30965@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 30, 2020 at 10:03:01PM +0900, Michael Paquier wrote:
> On Tue, Jan 28, 2020 at 02:52:08PM +0100, Tomas Vondra wrote:
> > On Tue, Jan 28, 2020 at 02:26:34PM +0100, Julien Rouhaud wrote:
> >> There were already some dependencies between the rows since parallel
> >> queries were added, as you could see eg. a parallel worker while no
> >> query is currently active. This patch will make those corner cases
> >> more obvious.
>
> I was reviewing the code and one thing that I was wondering is if it
> would be better to make the code more defensive and return NULL when
> the PID of the referenced leader is 0 or InvalidPid. However that
> would mean that we have a dummy 2PC entry from PGPROC or something not
> yet started, which makes no sense. So your simpler version is
> actually fine. What you have here is that in the worst case you could
> finish with an incorrect reference to shared memory if a PGPROC is
> recycled between the moment you look for the leader field and the
> moment the PID value is fetched. That's very unlikely to happen, and
> I agree that it does not really justify the cost of taking extra locks
> every time we scan pg_stat_activity.

Ok.

>
> > Yeah, sure. I mean explicit dependencies, e.g. a column referencing
> > values from another row, like leader_id does.
>
> + The leader_pid is NULL for processes not involved in parallel query.
> This is missing two markups, one for "NULL" and a second for
> "leader_pid".

The extra "leader_pid" disappeared when I reworked the doc. I'm not sure what
you meant here for NULL as I don't see any extra markup used in closeby
documentation, so I hope this version is ok.

> The documentation does not match the surroundings
> either, so I would suggest a reformulation for the beginning:
> PID of the leader process if this process is involved in parallel query.

> And actually this paragraph is not completely true, because leader_pid
> remains set even after one parallel query run has been finished for a
> session when leader_pid = pid as lockGroupLeader is set to NULL only
> once the process is stopped in ProcKill().

Oh good point, that's unfortunately not a super friendly behavior. I tried to
adapt the documentation to address of all that. It's maybe slightly too
verbose, but I guess that extra clarity is welcome here.

> >> Should I document the possible inconsistencies?
> >
> > I think it's worth mentioning that as a comment in the code, say before
> > the pg_stat_get_activity function. IMO we don't need to document all
> > possible inconsistencies, a generic explanation is enough.
>
> Agreed that adding some information in the area when we look at the
> PGPROC entries for wait events and such would be nice.

I added some code comments to remind that we don't guarantee any consistency
here.

Attachment Content-Type Size
pgsa-leader-pid-v5.diff text/plain 10.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 曾文旌 (义从) 2020-02-04 15:01:37 Re: [Proposal] Global temporary tables
Previous Message Rémi Lapeyre 2020-02-04 13:25:03 [PATCH v1] Allow COPY "text" format to output a header