Re: pg_receivewal starting position

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: pg_receivewal starting position
Date: 2021-08-30 09:56:07
Message-ID: 3561829.oDkbz2t6ih@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le samedi 28 août 2021, 14:10:25 CEST Bharath Rupireddy a écrit :
> On Thu, Aug 26, 2021 at 5:45 PM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
wrote:
> > order to fail early if the replication slot doesn't exist, so please find
> > attached v2 for that.
>
> Thanks for the patches. Here are some comments:
>

Thank you for this review ! Please see the other side of the thread where I
answered Michael Paquier with a new patchset, which includes some of your
proposed modifications.

> 1) While the intent of these patches looks good, I have following
> concern with new replication command READ_REPLICATION_SLOT: what if
> the pg_receivewal exits (because user issued a SIGINT or for some
> reason) after flushing the received WAL to disk, before it sends
> sendFeedback to postgres server's walsender so that it doesn't get a
> chance to update the restart_lsn in the replication slot via
> PhysicalConfirmReceivedLocation. If the pg_receivewal is started
> again, isn't it going to get the previous restart_lsn and receive the
> last chunk of flushed WAL again?

I've kept the existing directory as the source of truth if we have any WAL
there already. If we don't, we fallback to the restart_lsn on the replication
slot.
So in the event that we start it again from somewhere else where we don't have
access to those WALs anymore, we could be receiving it again, which in my
opinion is better than losing everything in between in that case.

>
> 2) What is the significance of READ_REPLICATION_SLOT for logical
> replication slots? I read above that somebody suggested to restrict
> the walsender to handle READ_REPLICATION_SLOT for physical replication
> slots so that the callers will see a command failure. But I tend to
> think that it is clean to have this command common for both physical
> and logical replication slots and the callers can have an Assert(type
> == 'physical').

I've updated the patch to make it easy for the caller to check the slot's type
and added a verification for those cases.

In general, I tried to implement the meaning of the different fields exactly as
it's done in the pg_replication_slots view.

>
> 3) Isn't it useful to send active, active_pid info of the replication
> slot via READ_REPLICATION_SLOT? pg_receivewal can use Assert(active ==
> true && active_pid == getpid()) as an assertion to ensure that it is
> the sole owner of the replication slot? Also, is it good send
> wal_status info

Typically we read the slot before attaching to it, since what we want to do is
check if it exists. It may be worthwile to check that it's not already used by
another backend though.

>
> 4) I think below messages should start with lower case letter and also
> there are some typos:
> + pg_log_warning("Could not fetch the replication_slot \"%s\" information "
> + pg_log_warning("Server does not suport fetching the slot's position, "
> something like:
> + pg_log_warning("could not fetch replication slot \"%s\" information, "
> + "resuming from current server position instead", replication_slot);
> + pg_log_warning("server does not support fetching replication slot
> information, "
> + "resuming from current server position instead");
>
I've rephrased it a bit in v3, let me know if that's what you had in mind.

> 5) How about emitting the above messages in case of "verbose"?

I think it is useful to warn the user even if not in the verbose case, but if
the consensus is to move it to verbose only output I can change it.

>
> 6) How about an assertion like below?
> + if (stream.startpos == InvalidXLogRecPtr)
> + {
> + stream.startpos = serverpos;
> + stream.timeline = servertli;
> + }
> +
> +Assert(stream.startpos != InvalidXLogRecPtr)>>

Good idea.

>
> 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an option?

From my point of view, I already expected it to use something like that when
using a replication slot. Maybe an option to turn it off could be useful ?

>
> 8) Just an idea, how about we store pg_receivewal's lastFlushPosition
> in a file before pg_receivewal exits and compare it with the
> restart_lsn that it received from the replication slot, if
> lastFlushPosition == received_restart_lsn well and good, if not, then
> something would have happened and we always start at the
> lastFlushPosition ?

The patch idea originally came from the fact that some utility use
pg_receivewal to fetch WALs, and move them elsewhere. In that sense I don't
really see what value this brings compared to the existing (and unmodified) way
of computing the restart position from the already stored files ?

Best regards,

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kasahara Tatsuhito 2021-08-30 10:16:09 Re: Error code for checksum failure in origin.c
Previous Message Ronan Dunklau 2021-08-30 09:55:42 Re: pg_receivewal starting position