Re: [HACKERS] WAL logging problem in 9.4.3?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: noah(at)leadboat(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-05-20 06:54:30
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


At Thu, 16 May 2019 23:50:50 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190517065050(dot)GA1298884(at)rfd(dot)leadboat(dot)com>
> On Tue, May 14, 2019 at 01:59:10PM +0900, Kyotaro HORIGUCHI wrote:
> > At Sun, 12 May 2019 17:37:05 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190513003705(dot)GA1202614(at)rfd(dot)leadboat(dot)com>
> > > On Sun, Mar 31, 2019 at 03:31:58PM -0700, Noah Misch wrote:
> > > > On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote:
> > > > > I also liked the design in the
> > > > > last paragraph, and I suspect it would have been no harder to back-patch. I
> > > > > wonder if it would have been simpler and better, but I'm not asking anyone to
> > > > > investigate that.
> > > >
> > > > Now I am asking for that. Would anyone like to try implementing that other
> > > > design, to see how much simpler it would be?
> >
> > Yeah, I think it is a bit too-complex for the value. But I think
> > it is the best way as far as we keep reusing a file on
> > truncation of the whole file.
> The design of v11-0006-Fix-WAL-skipping-feature.patch doesn't, in general,
> work for WAL records touching more than one buffer. For heapam, that patch
> works around this problem by emitting XLOG_HEAP_INSERT or XLOG_HEAP_DELETE
> when we'd normally emit XLOG_HEAP_UPDATE. As a result, post-crash-recovery
> heap page bits differ from the bits present when we don't crash. Though I'm
> 85% confident this does not introduce a bug today, this is fragile. That is
> the main complexity I wish to avoid.

Ok, I see your point. The same issue happens on index pages more
aggressively. I didn't allow wal-skipping on indexes for the

> I suspect the design in the last
> paragraph will be simpler, not more complex. In the implementation I'm
> envisioning, smgrDoPendingDeletes() would change name, perhaps to
> AtEOXact_Storage(). For every relfilenode it does not delete, it would ensure
> durability by syncing (for large nodes) or by WAL-logging each page (for small
> nodes). RelationNeedsWAL() would return false whenever the applicable
> relfilenode appears in pendingDeletes. Access methods would remove their
> smgrimmedsync() calls, but they would otherwise not change. Would anyone like
> to try implementing that?

Following this direction, the attached PoC works *at least for*
the wal_optimization TAP tests, but doing pending flush not in
smgr but in relcache. This is extending skip-wal feature to
indexes. And makes the old 0002 patch on nbtree useless.


Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v12-0001-TAP-test-for-copy-truncation-optimization.patch text/x-patch 10.7 KB
v12-0002-Fix-WAL-skipping-feature.patch text/x-patch 7.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-05-20 06:58:50 Re: remove doc/bug.template?
Previous Message Pavel Stehule 2019-05-20 06:03:46 Re: Table as argument in postgres function