Re: [HACKERS] WAL logging problem in 9.4.3?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: noah(at)leadboat(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-05-23 07:10:35
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is a new version.

At Tue, 21 May 2019 21:29:48 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190521(dot)212948(dot)34357392(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>

> At Mon, 20 May 2019 15:54:30 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190520(dot)155430(dot)215084510(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > I suspect the design in the last
> > > paragraph will be simpler, not more complex. In the implementation I'm
> > > envisioning, smgrDoPendingDeletes() would change name, perhaps to
> > > AtEOXact_Storage(). For every relfilenode it does not delete, it would ensure
> > > durability by syncing (for large nodes) or by WAL-logging each page (for small
> > > nodes). RelationNeedsWAL() would return false whenever the applicable
> > > relfilenode appears in pendingDeletes. Access methods would remove their
> > > smgrimmedsync() calls, but they would otherwise not change. Would anyone like
> > > to try implementing that?
> >
> > Following this direction, the attached PoC works *at least for*
> > the wal_optimization TAP tests, but doing pending flush not in
> > smgr but in relcache. This is extending skip-wal feature to
> > indexes. And makes the old 0002 patch on nbtree useless.
> This is a tidier version of the patch.
> - Passes regression tests including
> - Move the substantial work to table/index AMs.
> Each AM can decide whether to support WAL skip or not.
> Currently heap and nbtree support it.
> - The timing of sync is moved from AtEOXact to PreCommit. This is
> because heap_sync() needs xact state = INPROGRESS.
> - matview and cluster is broken, since swapping to new
> relfilenode doesn't change rd_newRelfilenodeSubid. I'll address
> that.

cluster/matview are fixed.

A obstacle to fix them was the unreliability of
newRelfilenodeSubid. As mentioned in the comment of
RelationData, newRelfilenodeSubid may dissapear by certain
sequence of commands.

In the attched v14, I added "rd_firstRelfilenodeSubid", which
stores the subtransaction id where the first relfilenode
replacementin the current transaction. It suivives any sequence
of commands, including one mentioned in CopyFrom's comment (which
I removed by this patch).

With the attached patch, on relations based on table/index AMs
that supports WAL-skipping, WAL-logging is eliminated if the
relation is created in the current transaction, or relfilenode is
replaced in the current transaction. At-commit file sync is
surely performed. (Only Heap and Btree support it.)


Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v14-0001-TAP-test-for-copy-truncation-optimization.patch text/x-patch 10.7 KB
v14-0002-Fix-WAL-skipping-feature.patch text/x-patch 37.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-05-23 08:47:48 Re: Excessive memory usage in multi-statement queries w/ partitioning
Previous Message Daniel Gustafsson 2019-05-23 07:09:52 Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?