Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-05-13 14:15:07
Message-ID: CAJcOf-f4C7s-dFagXCM1x9sTecO6qsZyyoNbWg7U60EA2KyM7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 13, 2021 at 11:25 AM Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn> wrote:
>

> Hi Andres,
> Thanks for you replay.

Er, it's Greg who has replied so far (not Andres).

>
> And If you still cannot reproduce it in 2 minitus. Could you run pgbench longer time, such as 30 or 60 minutes.
>

Actually, I did run it, multiple times, for more than 60 minutes, but
no assert/crash/coredump occurred in my environment.

>
> The parallel work process:
>
> 1, Get Snapshot and set TransactionXmin itself, in ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.
>
> 2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin 799425 in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.
>
> 3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin 799162, xmax 82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot.
>
> 4, parallel scan begin, with active snapshot(xmin 799162, xmax 82206) and TransactionXmin(799425),when scan tuple(xmin 799225) SubTransGetTopmostTransaction assert got.
>
> In HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.
>

I added some logging at a couple of points in the code:
1) In the Worker process code - ParallelWorkerMain() - where it
restores the serialized transaction and active snapshots (i.e. passed
to the Worker from the main process).
2) In the HeapTupleSatisfiesMVCC() function, immediately before it
calls XidInMVCCSnapshot()

After running it for an hour, examination of the log showed that in
ALL cases, the transaction snapshot xmin,xmax was always THE SAME as
the active snapshot xmin,xmax.
(Can you verify that this occurs on your system when things are
working, prior to the coredump?)

This is different to what you are getting in your environment (at
least, different to what you described when the problem occurs).
In your case, you say that the main process gets "the newer
transaction snapshot" - where exactly is this happening in your case?
(or this is what you don't yet know?)
Perhaps very occasionally this somehow happens on your system and
triggers the Assert (and coredump)? I have not been able to reproduce
that on my system.

Have you reproduced this issue on any other system, using the same
steps as you provided?
I'm wondering if there might be something else in your environment
that may be influencing this problem.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-05-13 14:39:13 Re: Enhanced error message to include hint messages for redundant options error
Previous Message vignesh C 2021-05-13 14:13:00 Re: alter subscription drop publication fixes