Re: [HACKERS] SERIALIZABLE with parallel query

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] SERIALIZABLE with parallel query
Date: 2018-02-22 17:05:51
Message-ID: CA+TgmoZdgFv-DRKf7z=KVvfRzYu8J7gX6M52j--aMHnoFmShjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:>
> The best solution I have come up with so far is to add a reference
> count to SERIALIZABLEXACT. I toyed with putting the refcount into the
> DSM instead, but then I ran into problems making that work when you
> have a query with multiple Gather nodes. Since the refcount is in
> SERIALIZABLEXACT I also had to add a generation counter so that I
> could detect the case where you try to attach too late (the leader has
> already errored out, the refcount has reached 0 and the
> SERIALIZABLEXACT object has been recycled).

I don't know whether that's safe or not. It certainly sounds like
it's solving one category of problem, but is that the only issue? If
some backends haven't noticed that we're safe, they might keep
acquiring SIREAD locks or doing other manipulations of shared state,
which maybe could cause confusion. I haven't looked into this deeply
enough to understand whether there's actually a possibility of trouble
there, but I can't rule it out off-hand.

One approach is to just disable this optimization for parallel query.
Being able to use SERIALIZABLE with parallel query is better than not
being able to do it, even if some optimizations are not applied in
that case. Of course making the optimizations work is better, but
we've got to be sure we're doing it right.

> PS I noticed that for BecomeLockGroupMember() we say "If we can't
> join the lock group, the leader has gone away, so just exit quietly"
> but for various other similar things we spew errors (most commonly
> seen one being "ERROR: could not map dynamic shared memory segment").
> Intentional?

I suppose I thought that if we failed to map the dynamic shared memory
segment, it might be down to any one of several causes; whereas if we
fail to join the lock group, it must be because the leader has already
exited. There might be a flaw in that thinking, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2018-02-22 17:20:19 Translations contributions urgently needed
Previous Message Robert Haas 2018-02-22 16:54:14 Re: non-bulk inserts and tuple routing