Re: [HACKERS] WAL logging problem in 9.4.3?

From: Noah Misch <noah(at)leadboat(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-03-31 22:31:58
Message-ID: 20190331223158.GB891537@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote:
> I also liked the design in the https://postgr.es/m/559FA0BA.3080808@iki.fi
> last paragraph, and I suspect it would have been no harder to back-patch. I
> wonder if it would have been simpler and better, but I'm not asking anyone to
> investigate that.

Now I am asking for that. Would anyone like to try implementing that other
design, to see how much simpler it would be? I now expect the already-drafted
design to need several more iterations before it reaches a finished patch.

Separately, I reviewed v9 of the already-drafted design:

> On Mon, Mar 04, 2019 at 12:24:48PM +0900, Kyotaro HORIGUCHI wrote:
> > +/*
> > + * RelationRemovePendingSync() -- remove pendingSync entry for a relation
> > + */
> > +void
> > +RelationRemovePendingSync(Relation rel)
>
> What is the coding rule for deciding when to call this? Currently, only
> ATExecSetTableSpace() calls this. CLUSTER doesn't call it, despite behaving
> much like ALTER TABLE SET TABLESPACE behaves.

This question still applies. (The function name did change from
RelationRemovePendingSync() to RelationInvalidateWALRequirements().)

On Mon, Mar 25, 2019 at 09:32:04PM +0900, Kyotaro HORIGUCHI wrote:
> At Wed, 20 Mar 2019 22:48:35 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190321054835(dot)GB3842129(at)rfd(dot)leadboat(dot)com>
> > On Wed, Mar 20, 2019 at 05:17:54PM +0900, Kyotaro HORIGUCHI wrote:
> > > At Sun, 10 Mar 2019 19:27:08 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190311022708(dot)GA2189728(at)rfd(dot)leadboat(dot)com>
> > > > On Mon, Mar 04, 2019 at 12:24:48PM +0900, Kyotaro HORIGUCHI wrote:
> > > > + elog(DEBUG2, "not skipping WAL-logging for rel %u/%u/%u block %u, because sync_above is %u",
> > >
> > > As you mention upthread, you have many debugging elog()s. These are too
> > > detailed to include in every binary, but I do want them in the code. See
> > > CACHE_elog() for a good example of achieving that.
> >
> > Agreed will do. They were need to check the behavior precisely
> > but usually not needed.
>
> I removed all such elog()s.

Again, I do want them in the code. Please restore them, but use a mechanism
like CACHE_elog() so they're built only if one defines a preprocessor symbol.

On Tue, Mar 26, 2019 at 04:35:07PM +0900, Kyotaro HORIGUCHI wrote:
> @@ -4097,6 +4104,8 @@ ReleaseSavepoint(const char *name)
> (errcode(ERRCODE_S_E_INVALID_SPECIFICATION),
> errmsg("savepoint \"%s\" does not exist within current savepoint level", name)));
>
> + smgrProcessWALRequirementInval(s->subTransactionId, true);
> +
> /*
> * Mark "commit pending" all subtransactions up to the target
> * subtransaction. The actual commits will happen when control gets to
> @@ -4206,6 +4215,8 @@ RollbackToSavepoint(const char *name)
> (errcode(ERRCODE_S_E_INVALID_SPECIFICATION),
> errmsg("savepoint \"%s\" does not exist within current savepoint level", name)));
>
> + smgrProcessWALRequirementInval(s->subTransactionId, false);

The smgrProcessWALRequirementInval() calls almost certainly belong in
CommitSubTransaction() and AbortSubTransaction(), not in these functions. By
doing it here, you'd get the wrong behavior in a subtransaction created via a
plpgsql "BEGIN ... EXCEPTION WHEN OTHERS THEN" block.

> +/*
> + * Process pending invalidation of WAL requirements happened in the
> + * subtransaction
> + */
> +void
> +smgrProcessWALRequirementInval(SubTransactionId sxid, bool isCommit)
> +{
> + HASH_SEQ_STATUS status;
> + RelWalRequirement *walreq;
> +
> + if (!walRequirements)
> + return;
> +
> + /* We expect that we don't have walRequirements in almost all cases */
> + hash_seq_init(&status, walRequirements);
> +
> + while ((walreq = hash_seq_search(&status)) != NULL)
> + {
> + /* remove useless entry */
> + if (isCommit ?
> + walreq->invalidate_sxid == sxid :
> + walreq->create_sxid == sxid)
> + hash_search(walRequirements, &walreq->relnode, HASH_REMOVE, NULL);

Do not remove entries during subtransaction commit, because a parent
subtransaction might still abort. See other CommitSubTransaction() callees
for examples of correct subtransaction handling. AtEOSubXact_Files() is one
simple example.

> @@ -3567,15 +3602,26 @@ heap_update
> */
> if (RelationIsAccessibleInLogicalDecoding(relation))
> {
> - log_heap_new_cid(relation, &oldtup);
> - log_heap_new_cid(relation, heaptup);
> + if (oldbuf_needs_wal)
> + log_heap_new_cid(relation, &oldtup);
> + if (newbuf_needs_wal)
> + log_heap_new_cid(relation, heaptup);

These if(...) conditions are always true, since they're redundant with
RelationIsAccessibleInLogicalDecoding(relation). Remove the conditions or
replace them with asserts.

> }
>
> - recptr = log_heap_update(relation, buffer,
> - newbuf, &oldtup, heaptup,
> - old_key_tuple,
> - all_visible_cleared,
> - all_visible_cleared_new);
> + if (oldbuf_needs_wal && newbuf_needs_wal)
> + recptr = log_heap_update(relation, buffer, newbuf,
> + &oldtup, heaptup,
> + old_key_tuple,
> + all_visible_cleared,
> + all_visible_cleared_new);
> + else if (oldbuf_needs_wal)
> + recptr = log_heap_delete(relation, buffer, &oldtup, old_key_tuple,
> + xmax_old_tuple, false,
> + all_visible_cleared);
> + else
> + recptr = log_heap_insert(relation, buffer, newtup,
> + 0, all_visible_cleared_new);

By using DELETE and INSERT records to implement an UPDATE, you lose the ctid
chain and infomask bits that were present before crash recovery. If that's
okay in these circumstances, please write a comment explaining why.

> @@ -1096,7 +1097,9 @@ _bt_insertonpg(Relation rel,
> cachedBlock = BufferGetBlockNumber(buf);
>
> /* XLOG stuff */
> - if (RelationNeedsWAL(rel))
> + if (BufferNeedsWAL(rel, buf) ||
> + (!P_ISLEAF(lpageop) && BufferNeedsWAL(rel, cbuf)) ||
> + (BufferIsValid(metabuf) && BufferNeedsWAL(rel, metabuf)))

This appears to have the same problem that heap_update() had in v7; if
BufferNeedsWAL(rel, buf) is false and BufferNeedsWAL(rel, metabuf) is true, we
emit WAL for both buffers. If that can't actually happen today, use asserts.

I don't want the btree code to get significantly more complicated in order to
participate in the RelWalRequirement system. If btree code would get more
complicated, it's better to have btree continue using the old system. If
btree's complexity would be essentially unchanged, it's still good to use the
new system.

> @@ -334,6 +334,10 @@ btbuild(Relation heap, Relation index, IndexInfo *indexInfo)
>
> reltuples = _bt_spools_heapscan(heap, index, &buildstate, indexInfo);
>
> + /* Skip WAL-logging if wal_level = minimal */
> + if (!XLogIsNeeded())
> + RecordWALSkipping(index);

_bt_load() still has an smgrimmedsync(wstate->index->rd_smgr, MAIN_FORKNUM),
which should be unnecessary after you add this end-of-transaction sync. Also,
this code can reach an assertion failure at wal_level=minimal:

910024 2019-03-31 19:12:13.728 GMT LOG: statement: create temp table x (c int primary key)
910024 2019-03-31 19:12:13.729 GMT DEBUG: CREATE TABLE / PRIMARY KEY will create implicit index "x_pkey" for table "x"
910024 2019-03-31 19:12:13.730 GMT DEBUG: building index "x_pkey" on table "x" serially
TRAP: FailedAssertion("!(((rel)->rd_rel->relpersistence == 'p'))", File: "storage.c", Line: 460)

Also, please fix whitespace problems that "git diff --check master" reports.

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2019-03-31 22:42:33 Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
Previous Message Tom Lane 2019-03-31 21:57:37 Re: FETCH FIRST clause PERCENT option