Re: Adding REPACK [concurrently]

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-04-06 09:03:44
Message-ID: 82004.1775466224@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> So I've been trying to understand the "Introduce an option to make
> logical replication database specific." patch and I have to confess I
> just cannot.
>
> As far as I can read, the point is that if we reach
> SnapBuildProcessRunningXacts() when db_specific is true (which means
> standby_decode is called in an output plugin that has set
> need_shared_catalogs to false), _and_ we've not reached consistent state
> yet, then we'll call LogStandbySnapshot with our DB oid to emit a new
> xl_running_xacts message.

Right.

> So the WAL-decoding process emits WAL. I don't know if in normal
> conditions logical decoding processes emit WAL. If this is exceptional,
> I think we should add a comment.

Emitting WAL in logical decoding is not exceptional: SnapBuildWaitSnapshot()
already calls LogStandbySnapshot(), in order to get to the next phase.

> Now, this additional WAL message will be processed by all other
> processes decoding WAL. Perhaps it will ignored by most of them.

Right. That's one thing that I realized late yesterday, after having posted
the latest version of the patch. In SnapBuildProcessRunningXacts(), we need

if (OidIsValid(running->dbid))
return;

rather than

if (db_specific)
return;

because other backends can also generate their database-specific records.

> But
> most importantly, it will also reach back to ourselves, at which point
> we can hopefully use it to see that we might have reached consistent
> state within our database. Then we know our snapshot is ready to be
> used.
>
> Is this correct?

Yes.

> I think the reason it's safe to skip a lot of the processing caused by
> this additional process, is that xl_running_xacts messages are also
> emitted in other places in a non-database specific manner. So all the
> other placecs that are emitting that message continue to exist and
> cause logical-decoders operate in the same way as before.

Yes. I'm still thinking though if, after having used the database-specific
record to reach CONSITENT, we sould enforce one cluster-wide record, so that
the cleanup (in "our backend") takes place sooner rather than later. Not sure
about that.

> I think we should sprinkle lots of comments in several places about
> this. For example, I propose that standby_redo() should have something
> like
>
> * If 'dbid' is valid, only gather transactions running in that database.
> + * Such records should not be the only ones emitted, because this has
> + * potentially dangerous side-effects which makes some places ignore them:
> + *
> + * 1. SnapBuildProcessRunningXacts will skip computing the xmin and restart
> + * point from its input record if the record's xmin is older that the
> + * snapbuilder's current xmin; this should normally be fine because that
> + * information will be updated from other xl_running_xacts records.
> + * 2. standby_redo will likewise skip processing such a record
> *
> (are there other things that should be mentioned?)

I added something like that, but - due to the reference to
SnapBuildProcessRunningXacts() - less verbose about snapbuild.c.

> Also, LogStandbySnapshot() should have a comment explaining that passing
> a valid dboid is a weird corner case which is to be used with care, and
> that functions X Y and Z are going to ignore snapshots carrying a valid
> dbid.

One more check added in standby_decode() (and mentioned in that function in
the comment).

> Why do we call SnapBuildFindSnapshot() to do this, instead of doing it
> directly in SnapBuildProcessRunningXacts? Seems like it would be more
> straightforward.

Yes, fixed.

One more problem I found when testing w/o background worker
(contrib/test_decoding) was that accessSharedCatalogsInDecoding was not set
back to true. Both AllocateSnapshotBuilder() FreeSnapshotBuilder() do that
now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
nocfbot.v53-0001.3-Introduce-an-option-to-make-logical-replication-.patch text/x-diff 22.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2026-04-06 09:14:58 Re: EXPLAIN: showing ReadStream / prefetch stats
Previous Message Masahiko Sawada 2026-04-06 08:44:52 Re: Introduce XID age based replication slot invalidation