Re: Bug in Physical Replication Slots (at least 9.5)?

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, jdnelson(at)dyn(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Bug in Physical Replication Slots (at least 9.5)?
Date: 2017-02-01 16:26:03
Message-ID: CAHGQGwEET=QBA_jND=xhrXn+9ZreP4_qMBAqsBZg56beqxbveg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Jan 19, 2017 at 6:37 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Wed, 18 Jan 2017 12:34:51 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQytF2giE7FD-4oJJpPVwiKJrDQPc24hLNGThX01SbSmA(at)mail(dot)gmail(dot)com>
>> On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > I managed to reproduce this. A little tweak as the first patch
>> > lets the standby to suicide as soon as walreceiver sees a
>> > contrecord at the beginning of a segment.
>>
>> Good idea.
>
> Thanks. Fortunately(?), the problematic situation seems to happen
> at almost all segment boundary.
>
>> > I believe that a continuation record cannot be span over three or
>> > more *segments* (is it right?), so keeping one spare segment
>> > would be enough. The attached second patch does this.
>>
>> I have to admit that I did not think about this problem much yet (I
>> bookmarked this report weeks ago to be honest as something to look
>> at), but that does not look right to me. Couldn't a record be spawned
>> across even more segments? Take a random string longer than 64MB or
>> event longer for example.
>
> Though I haven't look closer to how a modification is splitted
> into WAL records. A tuple cannot be so long. As a simple test, I
> observed rechder->xl_tot_len at the end of XLogRecordAssemble
> inserting an about 400KB not-so-compressable string into a text
> column, but I saw a series of many records with shorter than
> several thousand bytes.
>
>> > Other possible measures might be,
>> >
>> > - Allowing switching wal source while reading a continuation
>> > record. Currently ReadRecord assumes that a continuation record
>> > can be read from single source. But this needs refactoring
>> > involving xlog.c, xlogreader.c and relatives.
>>
>> This is scary thinking about back-branches.
>
> Yes. It would be no longer a bug fix. (Or becomes a quite ugly hack..)
>
>> > - Delaying recycling a segment until the last partial record on it
>> > completes. This seems doable in page-wise (coarse resolution)
>> > but would cost additional reading of past xlog files (page
>> > header of past pages is required).
>>
>> Hm, yes. That looks like the least invasive way to go. At least that
>> looks more correct than the others.
>
> The attached patch does that. Usually it reads page headers only
> on segment boundaries, but once continuation record found (or
> failed to read the next page header, that is, the first record on
> the first page in the next segment has not been replicated), it
> becomes to happen on every page boundary until non-continuation
> page comes.

I'm afraid that many WAL segments would start with a continuation record
when there are the workload of short transactions (e.g., by pgbench), and
which would make restart_lsn go behind very much. No?

The discussion on this thread just makes me think that restart_lsn should
indicate the replay location instead of flush location. This seems safer.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message pasquini.matteo 2017-02-01 16:58:46 BUG #14522: plpythonu, missed filenode
Previous Message Tom Lane 2017-02-01 15:13:18 Re: BUG #14521: pg_attribute.attndims = 0 for array column

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2017-02-01 16:26:33 Re: Cast jsonb to numeric, int, float, bool
Previous Message Tom Lane 2017-02-01 16:25:25 pgsql: Make psql's \set display variables in alphabetical order.