Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Venkata B Nagothi <nag1010(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Jonathon Nelson <jdnelson(at)dyn(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2017-04-02 22:19:20
Message-ID: CAEyp7J9A1-hzdLfoTKaKp=OO6ichifXaYzvOzAzP-NNfnTX27A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Mar 31, 2017 at 4:05 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Thank you having a look on this.
>
> # I removed -bugs in CC:.
>
> At Fri, 31 Mar 2017 13:40:00 +1100, Venkata B Nagothi <nag1010(at)gmail(dot)com>
> wrote in <CAEyp7J-BmsMqCVO7zX5vpEoEuRiM+KVPZPJzodUqOdOTZsLpqQ(at)mail(dot)
> gmail.com>
> > On Fri, Mar 17, 2017 at 6:48 PM, Kyotaro HORIGUCHI <
> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > > Hello,
> > >
> > > At Mon, 13 Mar 2017 11:06:00 +1100, Venkata B Nagothi <
> nag1010(at)gmail(dot)com>
> > > wrote in <CAEyp7J-4MmVwGoZSwvaSULZC80JDD_tL-9KsNiqF17+bNqiSBg(at)mail(dot)
> > > gmail.com>
> > > > On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI <
> > > > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > > > I managed to reproduce this. A little tweak as the first patch
> > > > > lets the standby to suicide as soon as walreceiver sees a
> > > > > contrecord at the beginning of a segment.
> > > > >
> > > > > - M(aster): createdb as a master with wal_keep_segments = 0
> > > > > (default), min_log_messages = debug2
> > > > > - M: Create a physical repslot.
> > > > > - S(tandby): Setup a standby database.
> > > > > - S: Edit recovery.conf to use the replication slot above then
> > > > > start it.
> > > > > - S: touch /tmp/hoge
> > > > > - M: Run pgbench ...
> > > > > - S: After a while, the standby stops.
> > > > > > LOG: #################### STOP THE SERVER
> > > > >
> > > > > - M: Stop pgbench.
> > > > > - M: Do 'checkpoint;' twice.
> > > > > - S: rm /tmp/hoge
> > > > > - S: Fails to catch up with the following error.
> > > > >
> > > > > > FATAL: could not receive data from WAL stream: ERROR:
> requested
> > > WAL
> > > > > segment 00000001000000000000002B has already been removed
> > > > >
> > > > >
> > > > I have been testing / reviewing the latest patch
> > > > "0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i
> might
> > > > need some more clarification on this.
> > > >
> > > > Before applying the patch, I tried re-producing the above error -
> > > >
> > > > - I had master->standby in streaming replication
> > > > - Took the backup of master
> > > > - with a low max_wal_size and wal_keep_segments = 0
> > > > - Configured standby with recovery.conf
> > > > - Created replication slot on master
> > > > - Configured the replication slot on standby and started the standby
> > >
> > > I suppose the "configure" means primary_slot_name in recovery.conf.
> > >
> > > > - I got the below error
> > > >
> > > > >> 2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length
> at
> > > > 0/F2000140: wanted 24, got 0
> > > > >> 2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL
> from
> > > > primary at 0/F2000000 on timeline 1
> > > > >> 2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive
> data
> > > > from WAL stream: ERROR: requested WAL segment
> 0000000100000000000000F2
> > > has
> > > > already been removed
> > >
> > > Maybe you created the master slot with non-reserve (default) mode
> > > and put a some-minites pause after making the backup and before
> > > starting the standby. For the case the master slot doesn't keep
> > > WAL segments unless the standby connects so a couple of
> > > checkpoints can blow away the first segment required by the
> > > standby. This is quite reasonable behavior. The following steps
> > > makes this more sure.
> > >
> > > > - Took the backup of master
> > > > - with a low max_wal_size = 2 and wal_keep_segments = 0
> > > > - Configured standby with recovery.conf
> > > > - Created replication slot on master
> > > + - SELECT pg_switch_wal(); on master twice.
> > > + - checkpoint; on master twice.
> > > > - Configured the replication slot on standby and started the standby
> > >
> > > Creating the slot with the following command will save it.
> > >
> > > =# select pg_create_physical_replication_slot('s1', true);
> > >
> >
> > I did a test again, by applying the patch and I am not sure if the patch
> is
> > doing the right thing ?
>
> This is a bit difficult to make it happen.
>

Yes, it seems to be quite difficult to re-produce.

>
> > Here is test case -
> >
> > - I ran pgbench
> > - I took the backup of the master first
> >
> > - Below are the WALs on master after the stop backup -
> >
> > postgres=# select pg_stop_backup();
> >
> > NOTICE: WAL archiving is not enabled; you must ensure that all required
> > WAL segments are copied through other means to complete the backup
> > pg_stop_backup
> > ----------------
> > 0/8C000130
> > (1 row)
> >
> > postgres=# \q
> > [dba(at)buildhost data]$ ls -ltrh pgdata-10dev-prsb-1/pg_wal/
> > total 65M
> > drwx------. 2 dba dba 4.0K Mar 31 09:36 archive_status
> > -rw-------. 1 dba dba 16M Mar 31 11:09 00000001000000000000008E
> > -rw-------. 1 dba dba 16M Mar 31 11:17 00000001000000000000008F
> > -rw-------. 1 dba dba 16M Mar 31 11:18 00000001000000000000008C
> > -rw-------. 1 dba dba 16M Mar 31 11:18 00000001000000000000008D
> >
> > - After the backup, i created the physical replication slot
> >
> >
> > postgres=# select pg_create_physical_replication_slot('repslot',true);
> >
> > pg_create_physical_replication_slot
> > -------------------------------------
> > (repslot,0/8D000028)
> > (1 row)
>
> At this point, segment 8C is not protected from removal. It's too
> late if the first record in 8D is continued from 8C.
>
> > postgres=# select pg_walfile_name('0/8D000028');
> >
> > pg_walfile_name
> > ---------------------------------------
> > 00000001000000000000008D
> > (1 row)
> >
> > Here, When you start the standby, it would ask for the file
> > 00000001000000000000008C, which is the first file needed for the standby
> > and since i applied your patch, i am assuming that, the file
> > 00000001000000000000008C should also be retained without being removed -
> > correct ?
>
> Yes if repslot were staying in 8C. So this is not the case.
>
> > - I started the standby and the below error occurs
> >
> > >> 2017-03-31 11:26:01.288 AEDT [17475] LOG: invalid record length at
> > 0/8C000108: wanted 24, got 0
> > >> 2017-03-31 11:26:01.291 AEDT [17486] LOG: started streaming WAL from
> > primary at 0/8C000000 on timeline 1
> > >> 2017-03-31 11:26:01.291 AEDT [17486] FATAL: could not receive data
> from
> > WAL stream: ERROR: requested WAL segment 00000001000000000000008C has
> > already been removed
>
> LoL. You hit this by a single shot! However, the case cannot be
> saved with this patch since the repslot is already in 8D. In
> other words, newly created slot doesn't protect the previous
> segments at the time of creation. The existence of required WAL
> segments is not guranteed without repslot anyway.
>

Yeah, this can be re-produced easily.

>
> > > and i could notice that the file "0000000100000000000000F2" was removed
> > > > from the master. This can be easily re-produced and this occurs
> > > > irrespective of configuring replication slots.
> > > >
> > > > As long as the file "0000000100000000000000F2" is available on the
> > > master,
> > > > standby continues to stream WALs without any issues.
> > > ...
> > > > If the scenario i created to reproduce the error is correct, then,
> > > applying
> > > > the patch is not making a difference.
> > >
> > > Yes, the patch is not for saving this case. The patch saves the
> > > case where the previous segment to the first required segment by
> > > standby was removed and it contains the first part of a record
> > > continues to the first required segment. On the other hand this
> > > case is that the segment at the start point of standby is just
> > > removed.
> > >
> >
> > Which means, the file 00000001000000000000008C must be retained as it is
> > the first file standby is looking for - correct ?
>
> Yes, but if and only if the repslot is successfully connected at
> least once. So I attached an additional patch to effectively
> cause the problematic situation. The attached patch let standby
> suicide when the condition for this bug is made.
>
> I wrote in <20170117(dot)193645(dot)160386781(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>:
> > I managed to reproduce this. A little tweak as the first patch
> > lets the standby to suicide as soon as walreceiver sees a
> > contrecord at the beginning of a segment.
> >
> > - M(aster): createdb as a master with wal_keep_segments = 0
> > (default), min_log_messages = debug2
> > - M: Create a physical repslot.
> > - S(tandby): Setup a standby database.
> > - S: Edit recovery.conf to use the replication slot above then
> > start it.
> > - S: touch /tmp/hoge
> > - M: Run pgbench ...
> > - S: After a while, the standby stops.
> > > LOG: #################### STOP THE SERVER
>
> At this point, pg_replication_slots.restart_lsn should be in the
> previous segment of the segment for the LSN shown in the last
> "LOG: ####.. CHECK AT" log. For example in the case below.
>
> > LOG: ############# CHECK AT 5000000 : 1
> > LOG: #################### STOP THE SERVER
>
> The master responds to the query "select restart_lsn from
> pg_replication_slots" with '0/4F......' with this patch but
> '0/50......' without this patch.
>
> > - M: Stop pgbench.
> > - M: Do 'checkpoint;' twice.
> > - S: rm /tmp/hoge
> > - S: Fails to catch up with the following error.
> >
> > > FATAL: could not receive data from WAL stream: ERROR: requested
> WAL segment 00000001000000000000002B has already been removed
>
> One other point of thie patch is that repslot *shoutd not* retain
> needless semgents. But this doesn't seem externally
> confirmable...
>

I will need to use this patch and see if i can re-produce the case.

As we are already past the commitfest, I am not sure, what should i change
the patch status to ?

Regards,

Venkata B N
Database Consultant

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2017-04-02 23:38:47 Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Previous Message Noah Misch 2017-04-02 22:08:56 Re: could not fork autovacuum worker process: No error

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-04-02 22:42:58 Re: PATCH: two slab-like memory allocators
Previous Message Fabien COELHO 2017-04-02 20:57:00 Re: Undefined psql variables