Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-08-04 03:41:11
Message-ID: CAJcOf-diA6h_jHj9pLJ5RAVk8XOuFddorREFi_fQ_8vOLaZD1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 4, 2021 at 3:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>The idea I sort of had floating around in my mind is a little
>different than what Greg has implemented. I was thinking that we could
>just skip SerializeSnapshot and the corresponding shm_toc_allocate()
>if !IsolationUsesXactSnapshot(). Then on the restore side we could
>just call shm_toc_lookup() with noError = true and skip
>RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.

I've tried to follow your description and have attached a patch to
hopefully match it, but it doesn't pass "make check-world".
Perhaps I messed something up (apologies if so), or additional changes
are needed to match what you had in mind or correct additional issues
you didn't foresee?

t/001_pgbench_with_server.pl .. 10/?
# Failed test 'pgbench scale 1 initialization status (got 1 vs expected 0)'
# at t/001_pgbench_with_server.pl line 108.
...
# creating primary keys...
# pgbench: fatal: query failed: ERROR: cannot take query snapshot
during a parallel operation
# CONTEXT: parallel worker
# pgbench: query was: alter table pgbench_accounts add primary key (aid)

>I don't know why Greg's patch is changing anything related to the
>active snapshot (as opposed to the transaction snapshot). Maybe
>there's a reason why we need that change, but I don't know what it is.

I don't think my v2/v5 patch is changing anything related to the
active snapshot (is it?).
It's restoring the serialized active snapshot, installing it as the
transaction snapshot and setting it as the active snapshot.
The patch removes the additionally-acquired transaction snapshot in
InitializeParallelDSM (which seems like a later snapshot to that which
is actually being used in the execution state for the statement, back
in the leader, if I recall correctly). Basically, in the parallel
workers, I've tried to match the snapshot setup to that used in the
leader.
If the v2/v5 patch doesn't work correctly for some isolation level,
I've yet to find it (but can't absolutely rule out there's some case
not accounted for).

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment Content-Type Size
v6-0001-RH-Fix-parallel-worker-failed-assertion-and-coredump.patch application/octet-stream 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-08-04 04:02:52 RE: Skipping logical replication transactions on subscriber side
Previous Message David Rowley 2021-08-04 03:36:00 Re: Use POPCNT on MSVC