|From:||Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>|
|Cc:||jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org|
|Subject:||Re: Bug in Physical Replication Slots (at least 9.5)?|
|Views:||Raw Message | Whole Thread | Download mbox|
At Wed, 18 Jan 2017 12:34:51 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQytF2giE7FD-4oJJpPVwiKJrDQPc24hLNGThX01SbSmA(at)mail(dot)gmail(dot)com>
> On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > I managed to reproduce this. A little tweak as the first patch
> > lets the standby to suicide as soon as walreceiver sees a
> > contrecord at the beginning of a segment.
> Good idea.
Thanks. Fortunately(?), the problematic situation seems to happen
at almost all segment boundary.
> > I believe that a continuation record cannot be span over three or
> > more *segments* (is it right?), so keeping one spare segment
> > would be enough. The attached second patch does this.
> I have to admit that I did not think about this problem much yet (I
> bookmarked this report weeks ago to be honest as something to look
> at), but that does not look right to me. Couldn't a record be spawned
> across even more segments? Take a random string longer than 64MB or
> event longer for example.
Though I haven't look closer to how a modification is splitted
into WAL records. A tuple cannot be so long. As a simple test, I
observed rechder->xl_tot_len at the end of XLogRecordAssemble
inserting an about 400KB not-so-compressable string into a text
column, but I saw a series of many records with shorter than
several thousand bytes.
> > Other possible measures might be,
> > - Allowing switching wal source while reading a continuation
> > record. Currently ReadRecord assumes that a continuation record
> > can be read from single source. But this needs refactoring
> > involving xlog.c, xlogreader.c and relatives.
> This is scary thinking about back-branches.
Yes. It would be no longer a bug fix. (Or becomes a quite ugly hack..)
> > - Delaying recycling a segment until the last partial record on it
> > completes. This seems doable in page-wise (coarse resolution)
> > but would cost additional reading of past xlog files (page
> > header of past pages is required).
> Hm, yes. That looks like the least invasive way to go. At least that
> looks more correct than the others.
The attached patch does that. Usually it reads page headers only
on segment boundaries, but once continuation record found (or
failed to read the next page header, that is, the first record on
the first page in the next segment has not been replicated), it
becomes to happen on every page boundary until non-continuation
I leave a debug info (at LOG level) in the attached file shown on
every state change of keep pointer. At least for pgbench, the
cost seems ignorable.
> > - Delaying write/flush feedback until the current record is
> > completed. walreceiver is not conscious of a WAL record and
> > this might break synchronous replication.
> Not sure about this one yet.
I'm not sure, too:p
NTT Open Source Software Center
|Next Message||Ron Ben||2017-01-19 12:30:33||installation of older versions|
|Previous Message||Вадим Акбашев||2017-01-19 09:11:59||Re: Strange influence of default_statistics_target|
|Next Message||Kuntal Ghosh||2017-01-19 09:49:36||Re: parallelize queries containing subplans|
|Previous Message||Dilip Kumar||2017-01-19 09:35:36||Re: parallelize queries containing subplans|