Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-05-20 06:08:36
Message-ID: CAJcOf-dVeKS2YJaS_BVdF7ghQ=xFj2d68TzGjYCvzo8cM0FtDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 20, 2021 at 11:18 AM Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn> wrote:
>
> Hi Greg,
> Thanks a lot for you explanation and your fix.
>
> I think your fix can resolve the core dump issue. As with your fix, parallel process reset Transaction Xmin from ActiveSnapshot.
> But it will change Transaction snapshot for all parallel scenarios. I don't know whether it bring in other issue.
> For test only, I think it is enough.
>
> So is there anybody can explain what's exactly difference between ActiveSnapshot and TransactionSnapshot in parallel work process.
> Then maybe we can find a better solution and try to fix it really.
>

I am proposing the attached patch to fix this issue (patches for both
PG13.2 and latest PG14 source are provided).

Perhaps this will trigger others who better know the intricacies of
snapshot handling, and know the reasons and history behind why the
transaction_snapshot and active_snapshot are currently passed
separately to parallel workers, to speak up here and discuss the issue
further, and check my fix (and note, I haven't attempted to modify
README.parallel in the patch until I get further clarification on this
issue).
Perhaps someone can explain the purpose of calling
GetTransactionSnapshot() AGAIN in InitializeParallelDSM() and how is
this consistent with the current ActiveSnapshot?
AFAICS, that doesn't seem correct, and that's why the patch removes it.

I've rebuilt Postgres with the patch applied and run the regression
tests, with and without "force_parallel_mode=regress", and all tests
pass.
So if the fix is wrong, no tests currently exist that detect issues with it.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment Content-Type Size
v1-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch application/octet-stream 6.4 KB
v1-0001-PG13_2-Fix-parallel-worker-failed-assertion-and-coredump.patch application/octet-stream 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2021-05-20 06:51:12 Re: Query about time zone patterns in to_char
Previous Message Michael Paquier 2021-05-20 06:07:56 Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set