Re: WAL logging problem in 9.4.3?

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, david(at)pgmasters(dot)net, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, kleptog(at)svana(dot)org, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2017-04-13 04:52:40
Message-ID: CAB7nPqTRyica1d-zU+YckveFC876=Sc847etmk7TRgAS2pA9CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 11, 2017 at 5:38 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Sorry, what I have just sent was broken.

You can use PROVE_TESTS when running make check to select a subset of
tests you want to run. I use that all the time when working on patches
dedicated to certain code paths.

>> - Relation has new members no_pending_sync and pending_sync that
>> works as instant cache of an entry in pendingSync hash.
>> - Commit-time synchronizing is restored as Michael's patch.
>> - If relfilenode is replaced, pending_sync for the old node is
>> removed. Anyway this is ignored on abort and meaningless on
>> commit.
>> - TAP test is renamed to 012 since some new files have been added.
>>
>> Accessing pending sync hash occurred on every calling of
>> HeapNeedsWAL() (per insertion/update/freeze of a tuple) if any of
>> accessing relations has pending sync. Almost of them are
>> eliminated as the result.

Did you actually test this patch? One of the logs added makes the
tests a long time to run:
2017-04-13 12:11:27.065 JST [85441] t/102_vacuumdb_stages.pl
STATEMENT: ANALYZE;
2017-04-13 12:12:25.766 JST [85492] LOG: BufferNeedsWAL: pendingSyncs
= 0x0, no_pending_sync = 0

- lsn = XLogInsert(RM_SMGR_ID,
- XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
+ rel->no_pending_sync= false;
+ rel->pending_sync = pending;
+ }
It seems to me that those flags and the pending_sync data should be
kept in the context of backend process and not be part of the Relation
data...

+void
+RecordPendingSync(Relation rel)
I don't think that I agree that this should be part of relcache.c. The
syncs are tracked should be tracked out of the relation context.

Seeing how invasive this change is, I would also advocate for this
patch as only being a HEAD-only change, not many people are
complaining about this optimization of TRUNCATE missing when wal_level
= minimal, and this needs a very careful review.

Should I code something? Or Horiguchi-san, would you take care of it?
The previous crash I saw has been taken care of, but it's been really
some time since I looked at this patch...
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2017-04-13 04:56:05 Re: tablesync patch broke the assumption that logical rep depends on?
Previous Message Amit Kapila 2017-04-13 04:17:11 Re: [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...