Re: group locking: incomplete patch, just for discussion

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: group locking: incomplete patch, just for discussion
Date: 2014-11-20 20:15:51
Message-ID: 20141120201551.GA3461@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-11-20 13:57:25 -0500, Robert Haas wrote:
> On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Except that it opens us up for all kinds of concurrency bugs. I'm pretty
> > strictly set against granting any self exclusive locks en-masse. If we
> > do this by default for all granted locks when starting a worker backend
> > it'll get *so* much harder to reason about correctness. Suddenly locks
> > don't guarantee what they used to anymore. We'll e.g. not be able to
> > rely that a CheckTableNotInUse() + AEL makes it safe to
> > drop/rewrite/whatever a relation - even if that happens in the main
> > backend.
>
> Haven't I responded to this a few times already?

Not in a particularly convincing way at least.

> I see no way, even theoretically, that it can be sane for
> CheckTableNotInUse() to succeed in a parallel context. Ever.

It's not exactly inconceivable that you'd want a parallel
CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating
backends, true, but it's still a pretty sane case.

You haven't really made your case why you want to allow access to AEL
locked relations in a parallel context in the first place.

> If the
> deadlock detector would kill the processes anyway, it doesn't matter,
> because CheckTableNotInUse() should do it first, so that we get a
> better error and without waiting for deadlock_timeout. So any
> scenario that's predicated on the assumption that CheckTableNotInUse()
> will succeed in a parallel context is 100% unconvincing to me as an
> argument for anything.

Meh, there's plenty cases where CheckTableNotInUse() isn't used where we
still rely on locks actually working. And it's not just postgres code,
this *will* break user code.

Your argument essentially is that we don't actually need to lock objects
if the other side only reads. Yes, you add a condition or two ontop
(e.g. CheckTableNotInUse() fails), but that's not that much. If we want
to do this we really need to forbid *any* modification to catalog/user
tables while parallel operations are ongoing. In the primary and worker
backends. I think that's going to end up being much more
problematic/restrictive than not allowing non-cooperative paralellism
after >= ShareUpdateExclusive. If we ever want to parallelize writing
operations we'll regret this quite a bit.

Breaking the locking system - and that's what you're doing by removing
the usual guarantees - seems just fundamentally wrong. Yes, I can't
name all the dangers that I think are out there.

The 'lets grant all the current locks at start of parallel query'
approach seems quite a bit safer than always doing that during parallel
query, don't get me wrong.

Btw, what are your thoughts about SERIALIZABLE and parallel query?
Afaics that'd be pretty hard to support?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-11-20 20:19:47 Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Previous Message Robert Haas 2014-11-20 20:14:50 Re: Doing better at HINTing an appropriate column within errorMissingColumn()