Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-07-15 02:34:47
Message-ID: CAJcOf-ePC2PH4gu8b5UgM7=x_HdRVgBs3mfagcXOTj7nko5R2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 10, 2021 at 3:36 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
>

Unfortunately there is currently no test, code-comment, README or
developer-discussion that definitively determines which approach (v2
vs v3/v4) is a valid fix for this issue.
We don't know if having both the transaction and active snapshots in a
parallel worker is intentional or not, and if so, why so?
(certainly in the non-parallel case of the same statement execution,
there is only one snapshot in question here - the obtained transaction
snapshot is pushed as the active snapshot, as it is done in 95% of
cases in the code)
It seems that only the original code authors know how the snapshot
handling in parallel-workers is MEANT to work, and they have yet to
speak up about it here.
At this point, we can only all agree that there is a problem to be fixed here.

My concern with the v3/v4 patch approach is that because the
parallel-workers use a later snapshot to what is actually used in the
execution context for the statement in the parallel leader, then it is
possible for the parallel leader and parallel workers to have
different transaction visibility, and surely this cannot be correct.
For example, suppose a transaction that deletes a row, completes in
the window between these two snapshots.
Couldn't the row be visible to the parallel workers but not to the
parallel leader?
My guess is that currently there are not enough
concurrent-transactions tests to expose such a problem, and the window
here is fairly small.

So we can fiddle xmin values to avoid the immediate Assert issue here,
but it's not addressing potential xmax-related issues.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-07-15 02:49:33 Re: Remove redundant Assert(PgArchPID == 0); in PostmasterStateMachine
Previous Message Julien Rouhaud 2021-07-15 02:28:41 Re: [HACKERS] Preserving param location