Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Date: 2016-02-14 19:02:08
Message-ID: CA+Tgmoazp7fh9LYFOPQ=kYk320it7ioxMuRuAO8YjVnvQVCShA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> First, the overall concept here is that processes can either be a
>> member of a lock group or a member of no lock group.
>
> Check.
>
>> Second, I can review the data structure changes and the associated
>> invariants.
>
> The data structure additions seemed relatively straightforward, though
> I did wonder why you added groupLeader to PROCLOCK. Why is that not
> redundant with tag.myProc->lockGroupLeader? If it isn't redundant,
> it's inadequately documented. If it is, seems better to lose it.

That's a good question. There are locking considerations: the lock
manager lock for a given PROCLOCK isn't necessarily the same one that
protects tag.myProc->lockGroupLeader. I can go and look back through
the contexts where we use the PROCLOCK field and see if we can get by
without this, but I suspect it's too much of a pain.

> Also, isn't the comment on this PGPROC field incorrect:
>
> PGPROC *lockGroupLeader; /* lock group leader, if I'm a follower */
>
> ie shouldn't you s/follower/member/ ?

Yes, that would be better.

>> ... the lock manager lock that protects the fields in a
>> given PGPROC is chosen by taking pgprocno modulo the number of lock
>> manager partitions.
>
> pgprocno of the specific PGPROC, or of the group leader? If it's
> the former, I'm pretty suspicious that there are deadlock and/or
> linked-list-corruption hazards here. If it's the latter, at least
> the comments around this are misleading.

Leader. The other way would be nuts.

>> Each PROCLOCK now has a new groupLeader flag. While a PGPROC's
>> lockGroupLeader may be NULL if that process is not involved in group
>> locking, a PROCLOCK's groupLeader always has a non-NULL value; it
>> points back to the PGPROC that acquired the lock.
>
> This is not what the comment on it says; and your prose explanation
> here implies that it should actually be equal to tag.myProc, or else
> you are using some definition of "acquired the lock" that I don't
> follow at all. There could be lots of PGPROCs that have acquired
> a lock in one sense or another; which one do you mean?

The PROCLOCK's groupLeader will be equal to
tag.myProc->lockGroupLeader unless that is NULL. In that case it will
be equal to tag.myProc. Or at least that's the intention, modulo
bugs.

>> With respect to pg_locks - and for that matter also pg_stat_activity -
>> I think you are right that improvement is needed.
>
> This is really the core of my concern at the moment. I think that
> isolationtester is probably outright broken for any situation where the
> queries-under-test are being parallel executed, and so will be any other
> client that's trying to identify who blocks whom from pg_locks.

OK.

>> The simplest thing we could do to make that easier is, in
>> pg_stat_activity, have parallel workers advertise the PID that
>> launched them in a new field; and in pg_locks, have members of a lock
>> group advertise the leader's PID in a new field.
>
> That would be simple for us, but it would break every existing client-side
> query that tries to identify blockers/blockees; and not only would those
> queries need work but they would become very substantially more complex
> and slower (probably at least 4-way joins not 2-way). We already know
> that isolationtester's query has performance problems in the buildfarm.
> I think more thought is needed here, and that this area should be
> considered a release blocker. It's certainly a blocker for any thought
> of turning parallel query on by default.

I don't quite see why this would cause the joins to get more complex.
It's also not really clear what the alternative is. You could show
all the locks under the leader's PID, but that's horribly confusing
because there might be duplicates, and because if somebody's waiting
you really want to be able to tell which process to target with gdb.
You can show everybody under their own PIDs as now, but then you can't
tell which processes are in groups together. So I don't see what
there is except to show both values. Maybe there's some whole-new way
of displaying this information that would be more clear but I don't
really see what it is.

>> I hope this helps and please let me know what more you think I should do.
>
> At the least you need to make another pass over the comments and README
> addition. I highlighted above a few critical inaccuracies, and it's not
> hard to find a lot of less-critical typos in the comments in that commit.
> Transposing some of what you've written here into the README would likely
> be a good thing too.

I'm getting on a long plane flight shortly but go back over this when
I can find some time to do that in an appropriately careful way.
Please feel free to make such corrections to comments and so forth as
may seem urgent to you in the meanwhile.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2016-02-14 22:05:04 Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Previous Message Tom Lane 2016-02-14 18:33:03 Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-02-14 20:21:51 Re: extend pgbench expressions with functions
Previous Message Tom Lane 2016-02-14 18:33:03 Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl