Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-07-23 06:00:14
Message-ID: CAJcOf-eCKTiE7=D0o0pX4T6443uo_r4rgujBh2vALcCkT0CsSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 22, 2021 at 2:15 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Thanks to Thomas Munro for drawing my attention to this thread. I
> wasn't intentionally ignoring it, but there's a lot of email in the
> world and only so much time.
>
> When I wrote this code originally, the idea that I had in mind was
> simply this: whatever state we have in the leader ought to be
> reproduced in each worker. So if there's an active snapshot in the
> leader, we ought to make that active in all the workers, and if
> there's a transaction snapshot in the leader, we ought to make that
> the transaction snapshot in all of the workers.
>
> But I see now that my thinking was fuzzy, and I'm going to blame that
> on the name GetTransactionSnapshot() being slightly misleading. If
> IsolationUsesXactSnapshot() is true, then there's really such a thing
> as a transaction snapshot and reproducing that in the worker is a
> sensible thing to do. But when !IsolationUsesXactSnapshot(),
> GetTransactionSnapshot() doesn't just "get the transaction snapshot",
> because there isn't any such thing. It takes a whole new snapshot, on
> the theory that you wouldn't be calling this function unless you had
> finished up with the snapshot you got the last time you called this
> function. And in the case of initiating parallel query, that is the
> wrong thing.
>
> I think that, at least in the case where IsolationUsesXactSnapshot()
> is true, we need to make sure that calling GetTransactionSnapshot() in
> a worker produces the same result that it would have produced in the
> leader. Say one of the workers calls an sql or plpgsql function and
> that function runs a bunch of SQL statements. It seems to me that
> there's probably a way for this to result in calls inside the worker
> to GetTransactionSnapshot(), and if that doesn't return the same
> snapshot as in the leader, then we've broken MVCC.
>
> What about when IsolationUsesXactSnapshot() is false? Perhaps it's OK
> to just skip this altogether in that case. Certainly what we're doing
> can't be right, because copying a snapshot that wouldn't have been
> taken without parallel query can't ever be the right thing to do.
> Perhaps we need to copy something else instead. I'm not really sure.
>
> So I think v2 is probably on the right track, but wrong when the
> transaction isolation level is REPEATABLE READ or SERIALIZABLE, and v3
> and v4 just seem like unprincipled hacks that try to avoid the
> assertion failure by lying about whether there's a problem.
>

Many thanks for taking time to respond to this (and thanks to Thomas Munro too).
It's much appreciated, as this is a complex area.
For the time being, I'll reinstate the v2 patch (say as v5) as a
partial solution, and then work on addressing the REPEATABLE READ and
SERIALIZABLE transaction isolation levels, which you point out are not
handled correctly by the patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment Content-Type Size
v5-0001-PG15-Fix-parallel-worker-failed-assertion-and-coredump.patch application/octet-stream 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tanghy.fnst@fujitsu.com 2021-07-23 06:12:02 [Doc] Tiny fix for regression tests example
Previous Message Michael Banck 2021-07-23 05:57:31 Re: [PATCH] New default role allowing to change per-role/database settings