Re: Physical replication slot advance is not persistent

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, michael(at)paquier(dot)xyz, simon(at)2ndquadrant(dot)com
Subject: Re: Physical replication slot advance is not persistent
Date: 2019-12-26 13:35:31
Message-ID: 682b1960-cb89-8738-dbdf-8033662fb80f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.12.2019 11:33, Kyotaro Horiguchi wrote:
> At Wed, 25 Dec 2019 20:28:04 +0300, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> wrote in
>>> Yep, it helps with physical replication slot persistence after
>>> advance, but the whole validation (moveto <= endlsn) does not make
>>> sense for me. The value of moveto should be >= than minlsn ==
>>> confirmed_flush / restart_lsn, while endlsn == retlsn is also always
>>> initialized with confirmed_flush / restart_lsn. Thus, your condition
>>> seems to be true in any case, even if it was no-op one, which we were
>>> intended to catch.
> ...
>> If I get it correctly, then we already keep previous slot position in
>> the minlsn, so we just have to compare endlsn with minlsn and treat
>> endlsn <= minlsn as a no-op without slot state flushing.
> I think you're right about the condition. (endlsn cannot be less than
> minlsn, though) But I came to think that we shouldn't use locations in
> that decision.
>
>> Attached is a patch that does this, so it fixes the bug without
>> affecting any user-facing behavior. Detailed comment section and DEBUG
>> output are also added. What do you think now?
>>
>> I have also forgotten to mention that all versions down to 11.0 should
>> be affected with this bug.
> pg_replication_slot_advance is the only caller of
> pg_logical/physical_replication_slot_advacne so there's no apparent
> determinant on who-does-what about dirtying and other housekeeping
> calculation like *ComputeRequired*() functions, but the current shape
> seems a kind of inconsistent between logical and physical.
>
> I think pg_logaical/physical_replication_slot_advance should dirty the
> slot if they actually changed anything. And
> pg_replication_slot_advance should do the housekeeping if the slots
> are dirtied. (Otherwise both the caller function should dirty the
> slot in lieu of the two.)
>
> The attached does that.

Both approaches looks fine for me: my last patch with as minimal
intervention as possible and yours refactoring. I think that it is a
right direction to let everyone who modifies slot->data also mark slot
as dirty.

I found some comment section in your code as rather misleading:

+        /*
+         * We don't need to dirty the slot only for the above change,
but dirty
+         * this slot for the same reason with
+         * pg_logical_replication_slot_advance.
+         */

We just modified MyReplicationSlot->data, which is "On-Disk data of a
replication slot, preserved across restarts.", so it definitely should
be marked as dirty, not because pg_logical_replication_slot_advance does
the same.

Also I think that using this transient variable in
ReplicationSlotIsDirty is not necessary. MyReplicationSlot is already a
pointer to the slot in shared memory.

+    ReplicationSlot *slot = MyReplicationSlot;
+
+    Assert(MyReplicationSlot != NULL);
+
+    SpinLockAcquire(&slot->mutex);

Otherwise it looks fine for me, so attached is the same diff, but with
these proposed corrections.

Another concern is that ReplicationSlotIsDirty is added with the only
one user. It also cannot be used by SaveSlotToPath due to the
simultaneous usage of both flags dirty and just_dirtied there.

In that way, I hope that we should call ReplicationSlotSave
unconditionally in the pg_replication_slot_advance, so slot will be
saved or not automatically based on the slot->dirty flag. In the same
time, ReplicationSlotsComputeRequiredXmin and
ReplicationSlotsComputeRequiredLSN should be called by anyone, who
modifies xmin and LSN fields in the slot. Otherwise, currently we are
getting some leaky abstractions.

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
0001-Make-sure-to-save-updated-physical-slot-v2.diff text/x-patch 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-12-26 13:53:29 Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
Previous Message Michael Paquier 2019-12-26 13:27:29 Re: Fix comment typos.