Re: Assertion failure in SnapBuildInitialSnapshot()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Assertion failure in SnapBuildInitialSnapshot()
Date: 2024-01-05 15:57:25
Message-ID: CA+TgmoYLzJxCEa0aCan3KR7o_25G52cbqw-90Q0VGRmV3a8XGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This thread has gone for about a year here without making any
progress, which isn't great.

On Tue, Feb 7, 2023 at 2:49 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating over
> the slots. ReplicationSlotsComputeRequiredXmin() can be called at a
> non-neglegible frequency. Callers like CreateInitDecodingContext(), that pass
> already_locked=true worry me a lot less, because obviously that's not a very
> frequent operation.

Maybe, but it would be good to have some data indicating whether this
is really an issue.

> I wonder if we could instead invert the locks, and hold
> ReplicationSlotControlLock until after ProcArraySetReplicationSlotXmin(), and
> acquire ProcArrayLock just for ProcArraySetReplicationSlotXmin(). That'd mean
> that already_locked = true callers have to do a bit more work (we have to be
> sure the locks are always acquired in the same order, or we end up in
> unresolved deadlock land), but I think we can live with that.

This seems like it could be made to work, but there's apparently a
shortage of people willing to write the patch.

As another thought, Masahiko-san writes in his proposed commit message:

"As a result, the replication_slot_xmin could be overwritten with an
old value and retreated."

But what about just surgically preventing that?
ProcArraySetReplicationSlotXmin() could refuse to retreat the values,
perhaps? If it computes an older value than what's there, it just does
nothing?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-01-05 16:27:51 Re: Adding facility for injection points (or probe points?) for more advanced tests
Previous Message Nikita Malakhov 2024-01-05 15:48:03 Re: POC: Extension for adding distributed tracing - pg_tracing