Re: Coding in WalSndWaitForWal

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Coding in WalSndWaitForWal
Date: 2019-11-12 19:27:16
Message-ID: 20191112192716.emrqs2afuefunw6v@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-11-11 13:53:40 -0300, Alvaro Herrera wrote:
> On 2019-Nov-11, Amit Kapila wrote:
>
> > On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> > > So your suggestion would be to call GetFlushRecPtr() before the first
> > > check on RecentFlushPtr and before entering the loop?
> >
> > No. What I meant was to keep the current code as-is and have an
> > additional check on RecentFlushPtr before entering the loop.
>
> I noticed that the "return" at the bottom of the function does a
> SetLatch(), but the other returns do not. Isn't that a bug?

I don't think it is - We never reset the latch in that case. I don't see
what we'd gain from setting it explicitly, other than unnecessarily
performing more work?

> /*
> * Fast path to avoid acquiring the spinlock in case we already know we
> * have enough WAL available. This is particularly interesting if we're
> * far behind.
> */
> if (RecentFlushPtr != InvalidXLogRecPtr &&
> loc <= RecentFlushPtr)
> + {
> + SetLatch(MyLatch);
> return RecentFlushPtr;
> + }

I.e. let's not do this.

> /* Get a more recent flush pointer. */
> if (!RecoveryInProgress())
> RecentFlushPtr = GetFlushRecPtr();
> else
> RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>
> + if (loc <= RecentFlushPtr)
> + {
> + SetLatch(MyLatch);
> + return RecentFlushPtr;
> + }

Hm. I'm doubtful this is a good idea - it essentially means we'd not
check for interrupts, protocol replies, etc. for an unbounded amount of
time. Whereas the existing fast-path does so for a bounded - although
not necessarily short - amount of time.

It seems to me it'd be better to just remove the "get a more recent
flush pointer" block - it doesn't seem to currently surve a meaningful
purpose.

> for (;;)
> {
> long sleeptime;
>
> /* Clear any already-pending wakeups */
> ResetLatch(MyLatch);
>
> @@ -2267,15 +2276,14 @@ WalSndLoop(WalSndSendDataCallback send_data)
>
> /* Sleep until something happens or we time out */
> (void) WaitLatchOrSocket(MyLatch, wakeEvents,
> MyProcPort->sock, sleeptime,
> WAIT_EVENT_WAL_SENDER_MAIN);
> }
> }
> - return;
> }

Having dug into history, the reason this exists is that there used to be
the following below the return:

-
-send_failure:
-
- /*
- * Get here on send failure. Clean up and exit.
- *
- * Reset whereToSendOutput to prevent ereport from attempting to send any
- * more messages to the standby.
- */
- if (whereToSendOutput == DestRemote)
- whereToSendOutput = DestNone;
-
- proc_exit(0);
- abort(); /* keep the compiler quiet */

but when 5a991ef8692ed (Allow logical decoding via the walsender
interface) moved the shutdown code into its own function,
WalSndShutdown(), we left the returns in place.

We still have the curious
proc_exit(0);
abort(); /* keep the compiler quiet */

pattern in WalSndShutdown() - wouldn't the right approach to instead
proc_exit() with pg_attribute_noreturn()?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-11-12 19:31:32 Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
Previous Message Tomas Vondra 2019-11-12 19:19:33 Re: [Patch] Optimize dropping of relation buffers using dlist