Re: BUG #15591: pg_receivewal does not honor replication slots

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15591: pg_receivewal does not honor replication slots
Date: 2019-01-14 14:09:46
Message-ID: CABUevEzAhJ-i6WopqBSxd6iFKqUcxiQzBub9nij=7zDzQA3h=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Jan 13, 2019 at 4:36 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Fri, Jan 11, 2019 at 1:50 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Hi,
>>
>> On 2019-01-11 16:52:42 +0000, PG Bug reporting form wrote:
>> > The following bug has been logged on the website:
>> >
>> > Bug reference: 15591
>> > Logged by: Jeff Janes
>> > Email address: jeff(dot)janes(at)gmail(dot)com
>> > PostgreSQL version: 11.1
>> > Operating system: all
>> > Description:
>> >
>> > When you invoke pg_receivewal using --slot to give it the name of an
>> > existing slot which has WAL reserved, and -D pointing to an empty
>> directory,
>> > it fast-forwards the slot's LSN reservation to the beginning of the most
>> > recent WAL file on the server, and starts streaming from there. Rather
>> than
>> > streaming from the LSN reservation point.
>>
>> ...
>
>
>
>> > Does this not utterly destroy the main point of using slots? If I
>> didn't
>> > want to ensure a gapless WAL stream, why use slots in the first place?
>>
>> So the upstream server doesn't drop WAL that a standby (or something
>> like that) still needs? It's pretty rare to randomly start to stream to
>> a differnt place.
>>
>
> I don't want to start it randomly. I want to start it where the
> pg_basebackup (or some other backup method) using the same slot name left
> off, which is not-by-coincidence the same place or later than where the
> slot itself left off. I thought that that was the point of slots--or at
> least the user-facing documentation implies it is and I don't see that it
> disclaims it for this particular case. It seems like pg_receivelog is a
> second class citizen, it doesn't count as either a standby, or as
> "something like that". At least not when you are first transitioning from
> the base backup to it. If you are resuming an interrupted or lagging
> pg_receivewal, then the slot does do its job. So the slots appear to be
> global on the surface, but functionally they are local to pg_receivewal.
>

I think the main part of the issue you're having is that slots really
aren't designed to have one slot used by more than one tool at a time. It's
basically one-slot-one-tool.

Specifically the usecase of having pg_receivewal pick up where
pg_basebackup left off definitely does make sense. I don't think that was a
usecase considered (normally the order is to set up the log archiving
first, whether by archive_command or by pg_receivewal), but it certainly
does make sense.

I agree with Andres that I wouldn't consider this a bug. It's definitely
something that could be a useful extension of the feature in the future
though.

The barrier to fixing it is that the replication protocol offers neither a
> way to interrogate where a slot left off, nor a way to tell it to pick up
> where a slot left off (regressed to the start of the WAL file). Other
> users of slot have a way to figure that out for themselves, but
> pg_receivewal (are there others?) do not.
>

The other big user of physical slot is a standby server of course. But the
standby server has a separate notion of where it left off (and will always
have) from it's own local state. It simply cannot get to the point of
starting to stream if that information isn't there (unlike pg_receivewal
which can -- in the empty directory case).

A work around is to "seed" the directory about to be used by pg_receivewal
> by copying the last wal file from the backup's pg_wal into it. (and adding
> .partial to the end? That probably isn't needed as the end of backup does
> a log switch)
>
> If this isn't a bug, then is there a way to document it so the end user
> knows what is going on? Or is there existing documentation I am
> overlooking? I guess the doc change would need to be in pg_receivelog, if
> the problem is unique to it.
>

The backup documentation does list "set up wal archive" before it gets to
taking the base backup, and I think the intention is "do that first". OTOH,
it does not mention pg_receivewal at all, so it's definitely not complete.
There's been discussion of overhauling that one for some time, but nobody's
gotten around to actually doing it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-01-14 14:55:20 BUG #15592: Memory overuse with subquery containing unnest() and set operations (11.x regression)
Previous Message Michael Paquier 2019-01-14 00:45:20 Re: Is temporary functions feature official/supported? Found some issues with it.