| From: | Antonin Houska <ah(at)cybertec(dot)at> |
|---|---|
| To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
| Cc: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2026-04-06 09:03:44 |
| Message-ID: | 82004.1775466224@localhost |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> So I've been trying to understand the "Introduce an option to make
> logical replication database specific." patch and I have to confess I
> just cannot.
>
> As far as I can read, the point is that if we reach
> SnapBuildProcessRunningXacts() when db_specific is true (which means
> standby_decode is called in an output plugin that has set
> need_shared_catalogs to false), _and_ we've not reached consistent state
> yet, then we'll call LogStandbySnapshot with our DB oid to emit a new
> xl_running_xacts message.
Right.
> So the WAL-decoding process emits WAL. I don't know if in normal
> conditions logical decoding processes emit WAL. If this is exceptional,
> I think we should add a comment.
Emitting WAL in logical decoding is not exceptional: SnapBuildWaitSnapshot()
already calls LogStandbySnapshot(), in order to get to the next phase.
> Now, this additional WAL message will be processed by all other
> processes decoding WAL. Perhaps it will ignored by most of them.
Right. That's one thing that I realized late yesterday, after having posted
the latest version of the patch. In SnapBuildProcessRunningXacts(), we need
if (OidIsValid(running->dbid))
return;
rather than
if (db_specific)
return;
because other backends can also generate their database-specific records.
> But
> most importantly, it will also reach back to ourselves, at which point
> we can hopefully use it to see that we might have reached consistent
> state within our database. Then we know our snapshot is ready to be
> used.
>
> Is this correct?
Yes.
> I think the reason it's safe to skip a lot of the processing caused by
> this additional process, is that xl_running_xacts messages are also
> emitted in other places in a non-database specific manner. So all the
> other placecs that are emitting that message continue to exist and
> cause logical-decoders operate in the same way as before.
Yes. I'm still thinking though if, after having used the database-specific
record to reach CONSITENT, we sould enforce one cluster-wide record, so that
the cleanup (in "our backend") takes place sooner rather than later. Not sure
about that.
> I think we should sprinkle lots of comments in several places about
> this. For example, I propose that standby_redo() should have something
> like
>
> * If 'dbid' is valid, only gather transactions running in that database.
> + * Such records should not be the only ones emitted, because this has
> + * potentially dangerous side-effects which makes some places ignore them:
> + *
> + * 1. SnapBuildProcessRunningXacts will skip computing the xmin and restart
> + * point from its input record if the record's xmin is older that the
> + * snapbuilder's current xmin; this should normally be fine because that
> + * information will be updated from other xl_running_xacts records.
> + * 2. standby_redo will likewise skip processing such a record
> *
> (are there other things that should be mentioned?)
I added something like that, but - due to the reference to
SnapBuildProcessRunningXacts() - less verbose about snapbuild.c.
> Also, LogStandbySnapshot() should have a comment explaining that passing
> a valid dboid is a weird corner case which is to be used with care, and
> that functions X Y and Z are going to ignore snapshots carrying a valid
> dbid.
One more check added in standby_decode() (and mentioned in that function in
the comment).
> Why do we call SnapBuildFindSnapshot() to do this, instead of doing it
> directly in SnapBuildProcessRunningXacts? Seems like it would be more
> straightforward.
Yes, fixed.
One more problem I found when testing w/o background worker
(contrib/test_decoding) was that accessSharedCatalogsInDecoding was not set
back to true. Both AllocateSnapshotBuilder() FreeSnapshotBuilder() do that
now.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
| Attachment | Content-Type | Size |
|---|---|---|
| nocfbot.v53-0001.3-Introduce-an-option-to-make-logical-replication-.patch | text/x-diff | 22.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-04-06 09:14:58 | Re: EXPLAIN: showing ReadStream / prefetch stats |
| Previous Message | Masahiko Sawada | 2026-04-06 08:44:52 | Re: Introduce XID age based replication slot invalidation |