Re: [HACKERS] WAL logging problem in 9.4.3?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: noah(at)leadboat(dot)com
Cc: 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-04-02 10:54:06
Message-ID: 20190402.195406.20162559.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for reviewing.

At Sun, 31 Mar 2019 15:31:58 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190331223158(dot)GB891537(at)rfd(dot)leadboat(dot)com>
> On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote:
> > On Mon, Mar 04, 2019 at 12:24:48PM +0900, Kyotaro HORIGUCHI wrote:
> > > +/*
> > > + * RelationRemovePendingSync() -- remove pendingSync entry for a relation
> > > + */
> > > +void
> > > +RelationRemovePendingSync(Relation rel)
> >
> > What is the coding rule for deciding when to call this? Currently, only
> > ATExecSetTableSpace() calls this. CLUSTER doesn't call it, despite behaving
> > much like ALTER TABLE SET TABLESPACE behaves.
>
> This question still applies. (The function name did change from
> RelationRemovePendingSync() to RelationInvalidateWALRequirements().)

It is called for heap_register_sync()'ed relations to avoid
syncing useless or trying to sync nonexistent files. I modifed
all CLUSTER, COPY FROM, CREATE AS, REFRESH MATVIEW and SET
TABLESPACE uses the function. (The function is renamed to
table_relation_invalidate_walskip()).

I noticed that heap_register_sync and friends are now a kind of
Table-AM function. So I added .relation_register_walskip and
.relation_invalidate_walskip in TableAMRoutine and moved the
heap_register_sync stuff as heapam_relation_register_walskip and
friends. .finish_bulk_insert() is modified to be used only
WAL-skip is active on the relation. (0004, 0005) But I'm not sure
that is the right direction.

(RelWALRequirements is renamed to RelWALSkip)

The change made smgrFinishBulkInsert (known as smgrDoPendingSync)
need to call a tableam interface. Relation is required to call it
in the designed way but relcache cannot live until there. In the
attached patch 0005, a new member TableAmRoutine *tableam is
added to RelWalSkip and calls finish_bulk_insert() via the
tableAm. But I'm quite uneasy with that...

> On Mon, Mar 25, 2019 at 09:32:04PM +0900, Kyotaro HORIGUCHI wrote:
> > At Wed, 20 Mar 2019 22:48:35 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190321054835(dot)GB3842129(at)rfd(dot)leadboat(dot)com>
> Again, I do want them in the code. Please restore them, but use a mechanism
> like CACHE_elog() so they're built only if one defines a preprocessor symbol.

Ah, sorry. I restored the messages using STORAGE_elog(). I also
needed this. (SMGR_ might be better but I'm not sure.)

> On Tue, Mar 26, 2019 at 04:35:07PM +0900, Kyotaro HORIGUCHI wrote:
> > + smgrProcessWALRequirementInval(s->subTransactionId, false);
>
> The smgrProcessWALRequirementInval() calls almost certainly belong in
> CommitSubTransaction() and AbortSubTransaction(), not in these functions. By
> doing it here, you'd get the wrong behavior in a subtransaction created via a
> plpgsql "BEGIN ... EXCEPTION WHEN OTHERS THEN" block.

Thanks. Moved it to AtSubAbort_smgr() and AtSubCommit_smgr(). (0005)

> > +/*
> > + * Process pending invalidation of WAL requirements happened in the
> > + * subtransaction
> > + */
> > +void
> > +smgrProcessWALRequirementInval(SubTransactionId sxid, bool isCommit)
> > +{
> > + HASH_SEQ_STATUS status;
> > + RelWalRequirement *walreq;
> > +
> > + if (!walRequirements)
> > + return;
> > +
> > + /* We expect that we don't have walRequirements in almost all cases */
> > + hash_seq_init(&status, walRequirements);
> > +
> > + while ((walreq = hash_seq_search(&status)) != NULL)
> > + {
> > + /* remove useless entry */
> > + if (isCommit ?
> > + walreq->invalidate_sxid == sxid :
> > + walreq->create_sxid == sxid)
> > + hash_search(walRequirements, &walreq->relnode, HASH_REMOVE, NULL);
>
> Do not remove entries during subtransaction commit, because a parent
> subtransaction might still abort. See other CommitSubTransaction() callees
> for examples of correct subtransaction handling. AtEOSubXact_Files() is one
> simple example.

Thanks. smgrProcessWALSkipInval() (0005) is changed so that:

- If a RelWalSkip entry is created in aborted subtransaction,
remove it.

- If a RelWalSkip entry is created then invalidated in committed
subtransaction, remove it.

- If a RelWalSkip entry is created and committed, change the
creator subtransaction to the parent subtransaction.

- If a RelWalSkip entry is create elsewhere and invalidated in
committed subtransaction, move the invalidation to the parent
subtransaction.

- If a RelWalSkip entry is created elsewhere and invalidated in
aborted subtransaction, cancel the invalidation.

Test is added as test3a2 and test3a3. (0001)

> > @@ -3567,15 +3602,26 @@ heap_update
> > */
> > if (RelationIsAccessibleInLogicalDecoding(relation))
> > {
> > - log_heap_new_cid(relation, &oldtup);
> > - log_heap_new_cid(relation, heaptup);
> > + if (oldbuf_needs_wal)
> > + log_heap_new_cid(relation, &oldtup);
> > + if (newbuf_needs_wal)
> > + log_heap_new_cid(relation, heaptup);
>
> These if(...) conditions are always true, since they're redundant with
> RelationIsAccessibleInLogicalDecoding(relation). Remove the conditions or
> replace them with asserts.

Ah.. I see. It is not the minimal case. Added a comment and an
assertion. (0006)

+ * catalog. Both oldbuf_needs_wal and newbuf_needs_wal must be true
+ * when logical decoding is active.

> By using DELETE and INSERT records to implement an UPDATE, you lose the ctid
> chain and infomask bits that were present before crash recovery. If that's
> okay in these circumstances, please write a comment explaining why.

Sounds reasonable. Added a comment. (Honestly I completely forgot
about that.. Thanks!) (0006)

+ * Insert log record. Using delete or insert log loses HOT chain
+ * information but that happens only when newbuf is different from
+ * buffer, where HOT cannot happen.

> > @@ -1096,7 +1097,9 @@ _bt_insertonpg(Relation rel,
> > | | | cachedBlock = BufferGetBlockNumber(buf);
> >
> > | | /* XLOG stuff */
> > - | | if (RelationNeedsWAL(rel))
> > + | | if (BufferNeedsWAL(rel, buf) ||
> > + | | | (!P_ISLEAF(lpageop) && BufferNeedsWAL(rel, cbuf)) ||
> > + | | | (BufferIsValid(metabuf) && BufferNeedsWAL(rel, metabuf)))
>
> This appears to have the same problem that heap_update() had in v7; if
> BufferNeedsWAL(rel, buf) is false and BufferNeedsWAL(rel, metabuf) is true, we
> emit WAL for both buffers. If that can't actually happen today, use asserts.
>
> I don't want the btree code to get significantly more complicated in order to
> participate in the RelWalRequirement system. If btree code would get more
> complicated, it's better to have btree continue using the old system. If
> btree's complexity would be essentially unchanged, it's still good to use the
> new system.

It was broken. I tried to fix it but page split baffled me. I
reverted it and added a comment there explaining the reason for
not applying BufferNeedsWAL stuff to nbtree. WAL-logging skip
feature is now restricted to work only on non-index
heaps. (getWalSkipEntry and RecordPendingSync in 0005)

> > @@ -334,6 +334,10 @@ btbuild(Relation heap, Relation index, IndexInfo *indexInfo)
> >
> > | reltuples = _bt_spools_heapscan(heap, index, &buildstate, indexInfo);
> >
> > + | /* Skip WAL-logging if wal_level = minimal */
> > + | if (!XLogIsNeeded())
> > + | | RecordWALSkipping(index);
>
> _bt_load() still has an smgrimmedsync(wstate->index->rd_smgr, MAIN_FORKNUM),
> which should be unnecessary after you add this end-of-transaction sync. Also,
> this code can reach an assertion failure at wal_level=minimal:
>
> 910024 2019-03-31 19:12:13.728 GMT LOG: statement: create temp table x (c int primary key)
> 910024 2019-03-31 19:12:13.729 GMT DEBUG: CREATE TABLE / PRIMARY KEY will create implicit index "x_pkey" for table "x"
> 910024 2019-03-31 19:12:13.730 GMT DEBUG: building index "x_pkey" on table "x" serially
> TRAP: FailedAssertion("!(((rel)->rd_rel->relpersistence == 'p'))", File: "storage.c", Line: 460)

This is what I mentioned as "broken" above. Sorry for the
silly mistake.

> Also, please fix whitespace problems that "git diff --check master" reports.

Thanks. Good to know the command.

After all, this patch set contains the following files.

v10-0001-TAP-test-for-copy-truncation-optimization.patch

Tap test script. Multi-level subtransaction case is added.

v10-0002-Write-WAL-for-empty-nbtree-index-build.patch

As mentioned above, nbtree patch has been shrinked to the
initial state of a workaround. Comment is rewrited. (v9-0002 +
v9-0008)

v10-0003-Move-XLOG-stuff-from-heap_insert-and-heap_delete.patch

Not substantially changed.

v10-0004-Add-new-interface-to-TableAmRoutine.patch

New file. Adds two new interfaces to TableAmRoutine and modified
one interface.

v10-0005-Add-infrastructure-to-WAL-logging-skip-feature.patch

Heavily revised version of v9-0004.
Some functions are renamed.
Fixed subtransaction handling.
Added STORAGE_elog() stuff.
Uses table-am functions.
Changes heapam stuff.

v10-0006-Fix-WAL-skipping-feature.patch

Revised version of v9-0005 + v9-0006 + v9-0007.

Added comment and assertion in heap_insert().

v10-0007-Remove-TABLE-HEAP_INSERT_SKIP_WAL.patch

Separated from v9-0005 so that subsequent patches are sane.

Removes TABLE/HEAP_ISNERT_SKIP_WAL.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v10-0001-TAP-test-for-copy-truncation-optimization.patch text/x-patch 10.7 KB
v10-0002-Write-WAL-for-empty-nbtree-index-build.patch text/x-patch 1.6 KB
v10-0003-Move-XLOG-stuff-from-heap_insert-and-heap_delete.patch text/x-patch 11.5 KB
v10-0004-Add-new-interface-to-TableAmRoutine.patch text/x-patch 5.6 KB
v10-0005-Add-infrastructure-to-WAL-logging-skip-feature.patch text/x-patch 32.2 KB
v10-0006-Fix-WAL-skipping-feature.patch text/x-patch 28.5 KB
v10-0007-Remove-TABLE-HEAP_INSERT_SKIP_WAL.patch text/x-patch 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-04-02 11:32:42 Re: GCoS2019--pgBackRest port to Windows (2019)
Previous Message Thomas Munro 2019-04-02 10:09:50 Re: Refactoring the checkpointer's fsync request queue