|From:||Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>|
|Cc:||pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz|
|Subject:||Re: [HACKERS] WAL logging problem in 9.4.3?|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Attached is a new version.
At Tue, 21 May 2019 21:29:48 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190521(dot)212948(dot)34357392(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Mon, 20 May 2019 15:54:30 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190520(dot)155430(dot)215084510(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > I suspect the design in the https://postgr.es/m/559FA0BA.email@example.com last
> > > paragraph will be simpler, not more complex. In the implementation I'm
> > > envisioning, smgrDoPendingDeletes() would change name, perhaps to
> > > AtEOXact_Storage(). For every relfilenode it does not delete, it would ensure
> > > durability by syncing (for large nodes) or by WAL-logging each page (for small
> > > nodes). RelationNeedsWAL() would return false whenever the applicable
> > > relfilenode appears in pendingDeletes. Access methods would remove their
> > > smgrimmedsync() calls, but they would otherwise not change. Would anyone like
> > > to try implementing that?
> > Following this direction, the attached PoC works *at least for*
> > the wal_optimization TAP tests, but doing pending flush not in
> > smgr but in relcache. This is extending skip-wal feature to
> > indexes. And makes the old 0002 patch on nbtree useless.
> This is a tidier version of the patch.
> - Passes regression tests including 018_wal_optimize.pl
> - Move the substantial work to table/index AMs.
> Each AM can decide whether to support WAL skip or not.
> Currently heap and nbtree support it.
> - The timing of sync is moved from AtEOXact to PreCommit. This is
> because heap_sync() needs xact state = INPROGRESS.
> - matview and cluster is broken, since swapping to new
> relfilenode doesn't change rd_newRelfilenodeSubid. I'll address
cluster/matview are fixed.
A obstacle to fix them was the unreliability of
newRelfilenodeSubid. As mentioned in the comment of
RelationData, newRelfilenodeSubid may dissapear by certain
sequence of commands.
In the attched v14, I added "rd_firstRelfilenodeSubid", which
stores the subtransaction id where the first relfilenode
replacementin the current transaction. It suivives any sequence
of commands, including one mentioned in CopyFrom's comment (which
I removed by this patch).
With the attached patch, on relations based on table/index AMs
that supports WAL-skipping, WAL-logging is eliminated if the
relation is created in the current transaction, or relfilenode is
replaced in the current transaction. At-commit file sync is
surely performed. (Only Heap and Btree support it.)
NTT Open Source Software Center
|Next Message||David Rowley||2019-05-23 08:47:48||Re: Excessive memory usage in multi-statement queries w/ partitioning|
|Previous Message||Daniel Gustafsson||2019-05-23 07:09:52||Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?|