|From:||Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>|
|To:||Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>|
|Cc:||Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Parallel scan with SubTransGetTopmostTransaction assert coredump|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2021-07-09 20:36, Tomas Vondra wrote:
> I took a quick look on this - I'm no expert in the details of
> so take my comments with a grain of salt.
> AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
> Greg is right the v3 patch does not seem like the right (or correct)
> solution, for a couple reasons:
> 1) It fixes the symptoms, not the cause. If we set TransactionXmin to a
> bogus value, this only fixes it locally in
> but I'd be surprised if other places did not have the same issue.
> 2) The xid/TransactionXmin comparison in the v2 fix:
> xid = xid > TransactionXmin ? xid : TransactionXmin;
> seems broken, because it compares the XIDs directly, but that's not
> going to work after generating enough transactions.
> 3) But even if this uses TransactionIdFollowsOrEquals, it seems very
> strange because the "xid" comes from
> XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
> i.e. it's the raw xmin from the tuple, so why should we be setting it
> TransactionXmin? That seems pretty strange, TBH.
> So yeah, I think this is due to confusion with two snapshots and
> to consider both of them when calculating TransactionXmin.
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
> So I think we need to keep both snapshots, but fix TransactionXmin to
> consider both of them - I suppose ParallelWorkerMain could simply look
> at the two snapshots, and use the minimum. Which is what  (already
> linked by Pavel) talks about, although that's related to concerns about
> one of the processes dying, which is not what's happening here.
> I'm wondering what consequences this may have on production systems,
> though. We've only seen this failing because of the assert, so what
> happens when the build has asserts disabled?
> Looking at SubTransGetTopmostTransaction, it seems the while loop ends
> immediately (because it's pretty much the opposite of the assert), so
> just return the "xid" as topmost XID. But will that produce incorrect
> query results, or what?
PFA v4 patch based on the ideas above.
In principle I see little difference with v3. But I agree it is more
I did test this patch on Linux and MacOS using testing algo above and
got no error. On master branch before the patch I still see this error.
|Next Message||Masahiro Ikeda||2021-07-13 10:13:17||Re: Fix comments of heap_prune_chain()|
|Previous Message||Masahiro Ikeda||2021-07-13 10:05:10||Re: Fix comments of heap_prune_chain()|