Re: Rework LogicalOutputPluginWriterUpdateProgress

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Rework LogicalOutputPluginWriterUpdateProgress
Date: 2023-02-10 21:04:23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This is a reply to:
split off, so patches to address some of my concerns don't confuse cfbot.

On 2023-02-09 11:21:41 +0530, Amit Kapila wrote:
> On Thu, Feb 9, 2023 at 1:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> > Attached is a current, quite rough, prototype. It addresses some of the points
> > raised, but far from all. There's also several XXXs/FIXMEs in it. I changed
> > the file-ending to .txt to avoid hijacking the CF entry.
> >
> I have started a separate thread to avoid such confusion. I hope that
> is fine with you.

In abstract, yes - unfortunately just changing the subject isn't going to
suffice, I'm afraid. The In-Reply-To header was still referencing the old
thread. The mail archive did see the threads as one, and I think that's what
cfbot uses as the source.

On 2023-02-09 11:21:41 +0530, Amit Kapila wrote:
> On Thu, Feb 9, 2023 at 1:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hacking on a rough prototype how I think this should rather look, I had a few
> > questions / remarks:
> >
> > - We probably need to call UpdateProgress from a bunch of places in decode.c
> > as well? Indicating that we're lagging by a lot, just because all
> > transactions were in another database seems decidedly suboptimal.
> >
> We can do that but I think in all those cases we will reach quickly
> enough back to walsender logic (WalSndLoop - that will send keepalive
> if required) that we don't need to worry. After processing each
> record, the logic will return back to the main loop that will send
> keepalive if required.

For keepalive processing yes, for syncrep and accurate lag tracking, I don't
think that suffices? We could do that in WalSndLoop() instead I guess, but
we'd have more information about when that's useful in decode.c.

> Also, while reading WAL if we need to block, it will call WalSndWaitForWal()
> which will send keepalive if required.

The fast-path prevents WalSndWaitForWal() from doing that in a lot of cases.

* Fast path to avoid acquiring the spinlock in case we already know we
* have enough WAL available. This is particularly interesting if we're
* far behind.
if (RecentFlushPtr != InvalidXLogRecPtr &&
loc <= RecentFlushPtr)
return RecentFlushPtr;

> The patch calls update_progress in change_cb_wrapper and other
> wrappers which will miss the case of DDLs that generates a lot of data
> that is not processed by the plugin. I think for that we either need
> to call update_progress from reorderbuffer.c similar to what the patch
> has removed or we need some other way to address it. Do you have any
> better idea?

I don't mind calling something like update_progress() in the specific cases
that's needed, but I think those are just the
if (!RelationIsLogicallyLogged(relation))
if (relation->rd_rel->relrewrite && !rb->output_rewrites))

To me it makes a lot more sense to call update_progress() for those, rather
than generally.

I think, independent of the update_progress calls, it'd be worth investing a
bit of time into optimizing those cases, so that we don't put the changes into
the reorderbuffer in the first place. I think we find space for two flag bits
to identify the cases in the WAL, rather than needing to access the catalog to
figure it out. If we don't find space, we could add an annotation the WAL
record (making it bigger) for the two cases, because they're not the path most
important to optimize.

> > - Why should lag tracking only be updated at commit like points? That seems
> > like it adds odd discontinuinities?
> >
> We have previously experimented to call it from non-commit locations
> but that turned out to give inaccurate information about Lag. See
> email [1].

That seems like an issue with WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS, not with
reporting something more frequently. ISTM that
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS just isn't a good proxy for when to
update lag reporting for records that don't strictly need it. I think that
decision should be made based on the LSN, and be deterministic.

> > - Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
> > WalSndWriteData() missing wal_sender_timeout <= 0 checks?
> >
> It seems we are checking that via
> ProcessPendingWrites()->WalSndKeepaliveIfNecessary(). Do you think we
> need to check it before as well?

Either we don't need the precheck at all, or we should do it reliably. Right
now we'll have a higher overhead / some behavioural changes, if
wal_sender_timeout is disabled. That doesn't make sense.

> > - I don't really understand why f95d53edged55 added !end_xact to the if
> > condition for ProcessPendingWrites(). Is the theory that we'll end up in an
> > outer loop soon?
> >
> Yes. For non-empty xacts, we will anyway send a commit message. For
> empty (skipped) xacts, we will send for synchronous replication case
> to avoid any delay.

That seems way too dependent on the behaviour of a specific output plugin,
there's plenty use cases where you'd not need a separate message emitted at
commit time. With what I proposed we would know whether we just wrote
something, or not.

> > > > I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> > > > consider what happens if e.g. the origin filter filters out entire
> > > > transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
> > > > we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> > > > reliably.
> > >
> Which case are you worried about? As mentioned in one of the previous
> points I thought the timeout/keepalive handling in the callers should
> be enough.

Well, you added syncrep specific logic to WalSndUpdateProgress(). The same
logic isn't present in the higher level loops. If we do need that logic, we
also need to trigger it if the origin filter filters out the entire
transaction. If we don't need it, then we shouldn't have it in
WalSndUpdateProgress() either.

> How about renaming ProcessPendingWrites to WaitToSendPendingWrites or
> WalSndWaitToSendPendingWrites?

I don't like those much:

We're not really waiting for the data to be sent or such, we just want to give
it to the kernel to be sent out. Contrast that to WalSndWaitForWal, where we
actually are waiting for something to complete.

I don't think 'write' is a great description either, although our existing
terminology is somewhat muddled. We're waiting calling pq_flush() until

WalSndSendPending() or WalSndFlushPending()?


Andres Freund


Browse pgsql-hackers by date

  From Date Subject
Next Message David Zhang 2023-02-10 21:08:35 Re: psql: Add role's membership options to the \du+ command
Previous Message Tom Lane 2023-02-10 20:50:20 Killing off removed rels properly