Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: nag1010(at)gmail(dot)com
Cc: jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2017-03-31 05:05:13
Message-ID: 20170331.140513.226934614.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thank you having a look on this.

# I removed -bugs in CC:.

At Fri, 31 Mar 2017 13:40:00 +1100, Venkata B Nagothi <nag1010(at)gmail(dot)com> wrote in <CAEyp7J-BmsMqCVO7zX5vpEoEuRiM+KVPZPJzodUqOdOTZsLpqQ(at)mail(dot)gmail(dot)com>
> On Fri, Mar 17, 2017 at 6:48 PM, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> > Hello,
> >
> > At Mon, 13 Mar 2017 11:06:00 +1100, Venkata B Nagothi <nag1010(at)gmail(dot)com>
> > wrote in <CAEyp7J-4MmVwGoZSwvaSULZC80JDD_tL-9KsNiqF17+bNqiSBg(at)mail(dot)
> > gmail.com>
> > > On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI <
> > > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > > I managed to reproduce this. A little tweak as the first patch
> > > > lets the standby to suicide as soon as walreceiver sees a
> > > > contrecord at the beginning of a segment.
> > > >
> > > > - M(aster): createdb as a master with wal_keep_segments = 0
> > > > (default), min_log_messages = debug2
> > > > - M: Create a physical repslot.
> > > > - S(tandby): Setup a standby database.
> > > > - S: Edit recovery.conf to use the replication slot above then
> > > > start it.
> > > > - S: touch /tmp/hoge
> > > > - M: Run pgbench ...
> > > > - S: After a while, the standby stops.
> > > > > LOG: #################### STOP THE SERVER
> > > >
> > > > - M: Stop pgbench.
> > > > - M: Do 'checkpoint;' twice.
> > > > - S: rm /tmp/hoge
> > > > - S: Fails to catch up with the following error.
> > > >
> > > > > FATAL: could not receive data from WAL stream: ERROR: requested
> > WAL
> > > > segment 00000001000000000000002B has already been removed
> > > >
> > > >
> > > I have been testing / reviewing the latest patch
> > > "0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might
> > > need some more clarification on this.
> > >
> > > Before applying the patch, I tried re-producing the above error -
> > >
> > > - I had master->standby in streaming replication
> > > - Took the backup of master
> > > - with a low max_wal_size and wal_keep_segments = 0
> > > - Configured standby with recovery.conf
> > > - Created replication slot on master
> > > - Configured the replication slot on standby and started the standby
> >
> > I suppose the "configure" means primary_slot_name in recovery.conf.
> >
> > > - I got the below error
> > >
> > > >> 2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length at
> > > 0/F2000140: wanted 24, got 0
> > > >> 2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL from
> > > primary at 0/F2000000 on timeline 1
> > > >> 2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive data
> > > from WAL stream: ERROR: requested WAL segment 0000000100000000000000F2
> > has
> > > already been removed
> >
> > Maybe you created the master slot with non-reserve (default) mode
> > and put a some-minites pause after making the backup and before
> > starting the standby. For the case the master slot doesn't keep
> > WAL segments unless the standby connects so a couple of
> > checkpoints can blow away the first segment required by the
> > standby. This is quite reasonable behavior. The following steps
> > makes this more sure.
> >
> > > - Took the backup of master
> > > - with a low max_wal_size = 2 and wal_keep_segments = 0
> > > - Configured standby with recovery.conf
> > > - Created replication slot on master
> > + - SELECT pg_switch_wal(); on master twice.
> > + - checkpoint; on master twice.
> > > - Configured the replication slot on standby and started the standby
> >
> > Creating the slot with the following command will save it.
> >
> > =# select pg_create_physical_replication_slot('s1', true);
> >
>
> I did a test again, by applying the patch and I am not sure if the patch is
> doing the right thing ?

This is a bit difficult to make it happen.

> Here is test case -
>
> - I ran pgbench
> - I took the backup of the master first
>
> - Below are the WALs on master after the stop backup -
>
> postgres=# select pg_stop_backup();
>
> NOTICE: WAL archiving is not enabled; you must ensure that all required
> WAL segments are copied through other means to complete the backup
> pg_stop_backup
> ----------------
> 0/8C000130
> (1 row)
>
> postgres=# \q
> [dba(at)buildhost data]$ ls -ltrh pgdata-10dev-prsb-1/pg_wal/
> total 65M
> drwx------. 2 dba dba 4.0K Mar 31 09:36 archive_status
> -rw-------. 1 dba dba 16M Mar 31 11:09 00000001000000000000008E
> -rw-------. 1 dba dba 16M Mar 31 11:17 00000001000000000000008F
> -rw-------. 1 dba dba 16M Mar 31 11:18 00000001000000000000008C
> -rw-------. 1 dba dba 16M Mar 31 11:18 00000001000000000000008D
>
> - After the backup, i created the physical replication slot
>
>
> postgres=# select pg_create_physical_replication_slot('repslot',true);
>
> pg_create_physical_replication_slot
> -------------------------------------
> (repslot,0/8D000028)
> (1 row)

At this point, segment 8C is not protected from removal. It's too
late if the first record in 8D is continued from 8C.

> postgres=# select pg_walfile_name('0/8D000028');
>
> pg_walfile_name
> ---------------------------------------
> 00000001000000000000008D
> (1 row)
>
> Here, When you start the standby, it would ask for the file
> 00000001000000000000008C, which is the first file needed for the standby
> and since i applied your patch, i am assuming that, the file
> 00000001000000000000008C should also be retained without being removed -
> correct ?

Yes if repslot were staying in 8C. So this is not the case.

> - I started the standby and the below error occurs
>
> >> 2017-03-31 11:26:01.288 AEDT [17475] LOG: invalid record length at
> 0/8C000108: wanted 24, got 0
> >> 2017-03-31 11:26:01.291 AEDT [17486] LOG: started streaming WAL from
> primary at 0/8C000000 on timeline 1
> >> 2017-03-31 11:26:01.291 AEDT [17486] FATAL: could not receive data from
> WAL stream: ERROR: requested WAL segment 00000001000000000000008C has
> already been removed

LoL. You hit this by a single shot! However, the case cannot be
saved with this patch since the repslot is already in 8D. In
other words, newly created slot doesn't protect the previous
segments at the time of creation. The existence of required WAL
segments is not guranteed without repslot anyway.

> > and i could notice that the file "0000000100000000000000F2" was removed
> > > from the master. This can be easily re-produced and this occurs
> > > irrespective of configuring replication slots.
> > >
> > > As long as the file "0000000100000000000000F2" is available on the
> > master,
> > > standby continues to stream WALs without any issues.
> > ...
> > > If the scenario i created to reproduce the error is correct, then,
> > applying
> > > the patch is not making a difference.
> >
> > Yes, the patch is not for saving this case. The patch saves the
> > case where the previous segment to the first required segment by
> > standby was removed and it contains the first part of a record
> > continues to the first required segment. On the other hand this
> > case is that the segment at the start point of standby is just
> > removed.
> >
>
> Which means, the file 00000001000000000000008C must be retained as it is
> the first file standby is looking for - correct ?

Yes, but if and only if the repslot is successfully connected at
least once. So I attached an additional patch to effectively
cause the problematic situation. The attached patch let standby
suicide when the condition for this bug is made.

I wrote in <20170117(dot)193645(dot)160386781(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>:
> I managed to reproduce this. A little tweak as the first patch
> lets the standby to suicide as soon as walreceiver sees a
> contrecord at the beginning of a segment.
>
> - M(aster): createdb as a master with wal_keep_segments = 0
> (default), min_log_messages = debug2
> - M: Create a physical repslot.
> - S(tandby): Setup a standby database.
> - S: Edit recovery.conf to use the replication slot above then
> start it.
> - S: touch /tmp/hoge
> - M: Run pgbench ...
> - S: After a while, the standby stops.
> > LOG: #################### STOP THE SERVER

At this point, pg_replication_slots.restart_lsn should be in the
previous segment of the segment for the LSN shown in the last
"LOG: ####.. CHECK AT" log. For example in the case below.

> LOG: ############# CHECK AT 5000000 : 1
> LOG: #################### STOP THE SERVER

The master responds to the query "select restart_lsn from
pg_replication_slots" with '0/4F......' with this patch but
'0/50......' without this patch.

> - M: Stop pgbench.
> - M: Do 'checkpoint;' twice.
> - S: rm /tmp/hoge
> - S: Fails to catch up with the following error.
>
> > FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 00000001000000000000002B has already been removed

One other point of thie patch is that repslot *shoutd not* retain
needless semgents. But this doesn't seem externally
confirmable...

regards,

Attachment Content-Type Size
repl_border_bug_test.patch text/x-patch 1.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sam Dai 2017-03-31 07:39:59 Re: BUG #14606: Memory usage continue to increase
Previous Message Kyotaro HORIGUCHI 2017-03-31 03:58:50 Re: BUG #14606: Memory usage continue to increase

Browse pgsql-hackers by date

  From Date Subject
Next Message Beena Emerson 2017-03-31 05:10:36 Re: increasing the default WAL segment size
Previous Message Craig Ringer 2017-03-31 04:55:35 Re: [PATCH] Reduce src/test/recovery verbosity