Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrus <kobruleht2(at)hot(dot)ee>
Cc: Paul Förster <paul(dot)foerster(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated
Date: 2020-06-01 09:13:58
Message-ID: CABUevEwKLAtAHAPb-kZcx+ZPxMMLboYiYL4a105M2kBB9Oy9SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 1, 2020 at 10:17 AM Andrus <kobruleht2(at)hot(dot)ee> wrote:

> Hi!
>
> > I have tried to re-initiate replica serveral times in low-use time but
> this error occurs again.
> >remove the whole replica's PGDATA/* and do a pg_basebackup again. But
> before that, make sure wal_keep_segments in big enough on the
> >master and,
>
> I renamed whole cluster before pg_basebackup
>
> >just as much important, do a vacuumdb -a (takes much space during the
> process) and use archiving!
>
> I run vacuumdb --full --all before pg_basebackup
>
> > If named replication slot is used commands like
> > vacuumdb --all --full
> > will cause main server crash due to disk space limit. pg_wal directory
> will occupy free disk space. After that main server stops.
> >>if you have disk constraints you will run into trouble sooner or later
> anyway. Make sure, you have enough disk space. There's no
> >>way around that anyway.
>
> This space is sufficient for base backup and replication.
>
> >> I tried using wal_keep_segments =180
> >> Will setting wal_keep_segments to higher value allw replication start
> after pg_basebackup ?
> >it depends. If you start the replica immediately and don't wait for hours
> or days, you should be good to go. But that depends on
> >different factors, for example, how >many WAL files are written during
> the pg_basebackup and pg_ctl start of the replica. If more
> >than 180 WALs have gone by on the master because it is really busy, >then
> you're probably lost again. Point being, you'll have to
> >launch the replica before WALs are expired!
> >Again: Make sure you have enough disk space, use archiving and use a
> replication slot.
>
> I tried with wal_keep_segments=360 but problem persisists.
> Server generates lot of less than 300 wal files.
>

Have you verified that wal_keep_segments actually end up at 360, by
connecting to the database and issuing SHOW wal_keep_segments? I've seen
far too many examples of people who accidentally had a second line that
overrode the one they thought they changed, and thus still ran with a lower
number.

Shell script starts server after pg_basebackup completes automatically:
>
> PGHOST=example.com
> PGPASSWORD=mypass
> PGUSER=replikaator
> export PGHOST PGPASSWORD PGUSER
> /etc/init.d/postgresql stop
> mv /var/lib/postgresql/12/main /var/lib/postgresql/12/mainennebaasbakuppi
> pg_basebackup --verbose --progress --write-recovery-conf -D
> /var/lib/postgresql/12/main
> chmod --recursive --verbose 0700 /var/lib/postgresql/12/main
> chown -Rv postgres:postgres /var/lib/postgresql/12/main
> /etc/init.d/postgresql start
>

Do you get any useful output from the -v part of pg_basebackup? It should
for example tell you the exact start and stop point in the wal during the
basebackup, that can be correlated to the msising file.

Normally the window between end of pg_basebackup and start of the actual
service is not big enough to cause a problem (since v12 will do a streaming
receive of the logs *during* the backup -- it could be a big problem before
that was possible, or if one forgot to enable it before it was the
default), and it certainly sounds weird that it should be in your case,
unless the chmod and chown commands take a *long* time. But if it is, there
is nothing preventing you from creating a slot just during setup and then
get rid of it. That is:

1. create slot
2. pg_basebackup with slot
3. start replication with slot
4. restart replication without slot once it's caught up
5. drop slot

However, if you want reliable replication, you really should have a slot.
Or at least, you should have either a slot *or* log archiving that's
read-accessible from the replica.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2020-06-01 09:58:35 Re: Oracle vs. PostgreSQL - a comment
Previous Message Paul Förster 2020-06-01 09:06:37 Re: How to start slave after pg_basebackup. Why min_wal_size and wal_keep_segments are duplicated