Re: [HACKERS] WAL logging problem in 9.4.3?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: noah(at)leadboat(dot)com, pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, michael(at)paquier(dot)xyz
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-11-06 08:29:27
Message-ID: 20191106.172927.978481031093477019.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for looking this.

At Tue, 5 Nov 2019 16:16:14 -0500, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> On Fri, Oct 25, 2019 at 9:21 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > This is the fixed verison v22.
> First, I'd like to restate my understanding of the problem just to see
..
> Second, for anyone who is not following this thread closely but is

Thanks for restating the issue and summarizing this patch. All of the
description match my understanding.

> perform all of the relevant fsync() calls at commit time. This is
> further augmented with a mechanism that instead logs all the relation
> pages in lieu of fsync()ing if the relation is very small, on the
> theory that logging a few FPIs will be cheaper than an fsync(). I view
> this additional mechanism as perhaps a bit much for a bug fix patch,
> but I understand that the goal is to prevent a performance regression,
> and it's not really over the top, so I think it's probably OK.

Thanks. It would need some benchmarking as mentioned upthread. My new
machine became to work steadily so I will do that.

> sound. I think there are a number of places where the comments could
> be better; I'll include a few points on that further down. I also
> think that the code in swap_relation_files() which takes ExclusiveLock
> on the relations looks quite strange. It's hard to understand why it's
> needed at all, or why that lock level is used. On the flip side, I

Right. It *was* a mistake of AccessExclusiveLock. On second thought,
callers must have taken locks on them with required level for
relfilenode swapping. However, one problematic case is toast indexes
of the target relation, which are not locked at all. Finally I used
AccessShareLock as it doesn't raise lock level other than
NoLock. Anyway the toast relation is not accessible outside the
session. (Done in the attached)

> think that the test suite looks really impressive and should be of
> considerable help not only in making sure that this is fixed but
> detecting if it gets broken again in the future. Perhaps it doesn't
> cover every scenario we care about, but if that turns out to be the
> case, it seems like it would be easily to further generalize. I really
> like the idea of this *kind* of test framework.

The paths running swap_relation_files are not covered. CLUSTER,
REFRESH MATVIEW and ALTER TABLE. CLUSTER and ALTER TABLE can interact
with INSERTs but MATVIEW cannot. Copying some of the existing test
cases using them will work. (Not yet done).

> Comments on comments, and other nitpicking:
>
> - in-trasaction is mis-spelled in the doc patch. accidentially is
> mis-spelled in the 0002 patch.

Thanks. I found another couple of typos "issuing"->"issueing",
"skipped"->"skpped" by ispell'ing git diff output and all fixed.

> - I think the header comment for the new TAP test could do a far
> better job explaining the overall goal of this testing than it
> actually does.

I rewrote it...

> - I think somewhere in relcache.c or rel.h there ought to be comments
> explaining the precise degree to which rd_createSubid,
> rd_newRelfilenodeSubid, and rd_firstRelfilenodeSubid are reliable,
> including problem scenarios. This patch removes some language of this
> sort from CopyFrom(), which was a funny place to have that information
> in the first place, but I don't see that it adds anything to replace
> it. I also think that we ought to explain - for the fields that are
> reliable - that they need to be reliable precisely for the purpose of
> not breaking this stuff. There's a bit of this right now:
>
> + * rd_firstRelfilenodeSubid is the ID of the first subtransaction the
> + * relfilenode change has took place in the current transaction. Unlike
> + * newRelfilenodeSubid, this won't be accidentially forgotten. A valid OID
> + * means that the currently active relfilenode is transaction-local and we
> + * sync the relation at commit instead of WAL-logging.
>
> ...but I think that needs to be somewhat expanded and clarified.

Agreed. It would be crude but I put augmenting descriptions of how the
variables work and descriptions in contrast to rd_first*.

# rd_first* is not a hint in the sense that it is reliable but it is
# mentioned as hint in some places, which will need fix.

If the fix of MarkBufferDirtyHint is ok, I'll merge it into 0002.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v23-0001-TAP-test-for-copy-truncation-optimization.patch text/x-patch 11.6 KB
v23-0002-Fix-WAL-skipping-feature.patch text/x-patch 52.0 KB
v23-0003-Fix-MarkBufferDirtyHint.patch text/x-patch 2.3 KB
v23-0004-Documentation-for-wal_skip_threshold.patch text/x-patch 1.8 KB
v23-0005-Additional-test-for-new-GUC-setting.patch text/x-patch 2.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mahendra Singh 2019-11-06 08:31:02 Re: [HACKERS] Block level parallel vacuum
Previous Message Amit Kapila 2019-11-06 08:29:19 Re: dropdb --force