Re: [HACKERS] WAL logging problem in 9.4.3?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: noah(at)leadboat(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-08-21 07:32:38
Message-ID: 20190821.163238.176512239.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello. New version is attached.

At Mon, 19 Aug 2019 18:59:59 +0900 (Tokyo Standard Time), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in <20190819(dot)185959(dot)118543656(dot)horikyota(dot)ntt(at)gmail(dot)com>
> Thank you for taking time.
>
> At Sat, 17 Aug 2019 20:52:30 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190818035230(dot)GB3021338(at)rfd(dot)leadboat(dot)com>
> > For two-phase commit, PrepareTransaction() needs to execute pending syncs.

Now TwoPhaseFileHeader has two new members for pending syncs. It
is useless on wal-replay, but that is needed for commit-prepared.

> > On Thu, Jul 25, 2019 at 10:39:36AM +0900, Kyotaro Horiguchi wrote:
> > > --- a/src/backend/access/heap/heapam_handler.c
> > > +++ b/src/backend/access/heap/heapam_handler.c
> > > @@ -715,12 +702,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
> ...
> > > - use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
> > > -
> > > /* use_wal off requires smgr_targblock be initially invalid */
> > > Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
> >
> > Since you're deleting the use_wal variable, update that last comment.

Oops! Rewrote it.

> > > --- a/src/backend/catalog/storage.c
> > > +++ b/src/backend/catalog/storage.c
> > > @@ -428,21 +450,34 @@ smgrDoPendingDeletes(bool isCommit)
> ...
> > > + smgrimmedsync(srel, MAIN_FORKNUM);
> >
> > This should sync all forks, not just MAIN_FORKNUM. Code that writes WAL for
> > FSM_FORKNUM and VISIBILITYMAP_FORKNUM checks RelationNeedsWAL(). There may be
> > no bug today, but it's conceptually wrong to make RelationNeedsWAL() return
> > false due to this code, use RelationNeedsWAL() for multiple forks, and then
> > not actually sync all forks.
>
> I agree that all forks needs syncing, but FSM and VM are checking
> RelationNeedsWAL(modified). To make sure, are you suggesting to
> sync all forks instead of emitting WAL for them, or suggesting
> that VM and FSM to emit WALs even when the modified
> RelationNeedsWAL returns false (+ sync all forks)?

All forks are synced and have no WALs emitted (as before) in the
attached version 19. FSM and VM are not changed.

> > The https://postgr.es/m/559FA0BA.3080808@iki.fi design had another component
> > not appearing here. It said, "Instead, at COMMIT, we'd fsync() the relation,
> > or if it's smaller than some threshold, WAL-log the contents of the whole file
> > at that point." Please write the part to WAL-log the contents of small files
> > instead of syncing them.
>
> I'm not sure the point of the behavior. I suppose that the "log"
> is a sequence of new_page records. It also needs to be synced and
> it is always larger than the file to be synced. I can't think of
> an appropriate threshold without the point.

This is not included in this version. I'll continue to consider
this.

> > > --- a/src/backend/commands/copy.c
> > > +++ b/src/backend/commands/copy.c
> > > @@ -2725,28 +2722,9 @@ CopyFrom(CopyState cstate)
> > > * If it does commit, we'll have done the table_finish_bulk_insert() at
> > > * the bottom of this routine first.
> > > *
> > > - * As mentioned in comments in utils/rel.h, the in-same-transaction test
> > > - * is not always set correctly, since in rare cases rd_newRelfilenodeSubid
> > > - * can be cleared before the end of the transaction. The exact case is
> > > - * when a relation sets a new relfilenode twice in same transaction, yet
> > > - * the second one fails in an aborted subtransaction, e.g.
> > > - *
> > > - * BEGIN;
> > > - * TRUNCATE t;
> > > - * SAVEPOINT save;
> > > - * TRUNCATE t;
> > > - * ROLLBACK TO save;
> > > - * COPY ...
> >
> > The comment material being deleted is still correct, so don't delete it.

The code is changed to use rd_firstRelfilenodeSubid instead of
rd_firstRelfilenodeSubid which has the issue mentioned in the
deleted section. So this is right but irrelevant to the code
here. The same thing is written in the comment in RelationData.

(In short, not reverted)

> > Moreover, the code managing rd_firstRelfilenodeSubid has a similar bug. The
> > attached patch adds an assertion that RelationNeedsWAL() and the
> > pendingDeletes array have the same opinion about the relfilenode, and it
> > expands a test case to fail that assertion.
..
> > In general, to add a field like this, run "git grep -n 'rd_.*Subid'" and audit
> > all the lines printed. Many bits of code need to look at all three,
> > e.g. RelationClose().

I forgot to maintain rd_firstRelfilenode in many places and the
assertion failure no longer happens after I fixed it. Opposite to
my previous mail, of course useless pending entries are removed
at subtransction abort and no needless syncs happen in that
meaning. But another type of useless sync was seen with the
previous version 18.

(In short fixed.)

> > This field needs to be 100% reliable. In other words,
> > it must equal InvalidSubTransactionId if and only if the relfilenode matches
> > the relfilenode that would be in place if the top transaction rolled back.

Sorry, I confused this with another similar behavior of the
previous version 18, where files are synced even if it is to be
removed immediately at commit. In this version
smgrDoPendingOperations doesn't sync to-be-deleted files.

While checking this, I found that smgrDoPendingDeletes is making
unnecessary call to smgrclose() which lead server to crash while
deleting files. I removed it.

Please find the new version attached.

Changes:

- Rebased to f8cf524da1.

- Fixed prepare transaction. test2a catches this.
(twophase.c)

- Fixed a comment in heapam_relation_copy_for_cluster.

- All forks are synced. (smgrDoPendingDeletes/Operations, SyncRelationFiles)

- Fixed handling of rd_firstRelfilenodeSubid.
(RelationBuildLocalRelation, RelationSetNewRelfilenode,
load_relcache_init_file)

- Prevent to-be-deleted files from syncing. (smgrDoPendingDeletes/Operations)

- Fixed a crash bug caused by smgrclose() in smgrDoPendingOperations.

Minor changes:

- Renamed: PendingRelOps => PendingRelOp
- Type changed: bool PendingRelOp.dosync => PendingOpType PendingRelOp.op

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v19-0001-TAP-test-for-copy-truncation-optimization.patch text/x-patch 11.3 KB
v19-0002-Fix-WAL-skipping-feature.patch text/x-patch 43.7 KB
v19-0003-Rename-smgrDoPendingDeletes-to-smgrDoPendingOperatio.patch text/x-patch 10.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-21 07:37:14 Re: Fix typos and inconsistencies for HEAD (take 11)
Previous Message Peter Eisentraut 2019-08-21 07:23:50 Re: ICU for global collation