Re: Error with index on unlogged table

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(dot)paquier(at)gmail(dot)com
Cc: robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, andres(at)2ndquadrant(dot)com, fabriziomello(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Error with index on unlogged table
Date: 2015-12-01 02:11:01
Message-ID: 20151201.111101.189005118.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, I studied your lastest patch.

At Fri, 27 Nov 2015 16:59:20 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqRoaCMhr4hjEgq4rCZ4GaCB-6=cH8b2U7K7T5-kBGC5bA(at)mail(dot)gmail(dot)com>
> On Fri, Nov 27, 2015 at 3:42 PM, Michael Paquier wrote:
> > I am still investigating for a correct fix, looking at reinit.c the
> > code in charge of copying the init fork as the main fork for a
> > relation at the end of recovery looks to be doing its job correctly...
>
> Attached is a patch that fixes the issue for me in master and 9.5.
> Actually in the last patch I forgot a call to smgrwrite to ensure that
> the INIT_FORKNUM is correctly synced to disk when those pages are
> replayed at recovery, letting the reset routines for unlogged
> relations do their job correctly. I have noticed as well that we need
> to do the same for gin and brin relations. In this case I think that
> we could limit the flush to unlogged relations, my patch does it
> unconditionally though to generalize the logic. Thoughts?

I feel quite uncomfortable that it solves the problem from a kind
of nature of unlogged object by arbitrary flagging which is not
fully corresponds to the nature. If we can deduce the necessity
of fsync from some nature, it would be preferable.

In the current patch, is_sync for log_newpage is generally true
for and only for INIT_FORKNUM pages. Exceptions as far as I can
see are,

copy_relation_data: called with arbitrary forknum but it doesn't
set is_fsync even for coying INIT_FORKNUM. (Is this not a
problem?)

spgbuildempty, ginbuildempty: these emits two or three newpage
logs at once so only the last one is set is_fsync for
performance reason.

And other anormallies are,

ginbuildempty, gistbuildempty: These funciton doesn't seem to
immediately fsync but is_fsync is set to INIT_FORKNUM. Of
course it wouldn't be a problem.

In short, it seems to me that the reason to choose using
XLOG_FPI_FOR_SYNC here is only performance of processing
successive FPIs for INIT_FORKNUM.

INIT_FORKNUM is generated only for unlogged tables and their
belongings. I suppose such successive fsyncs doesn't cause
observable performance drop assuming that the number of unlogged
tables and belongings is not so high, especially with smarter
storages. All we should do is that just fsync only for
INIT_FORKNUM's FPIs for the case. If the performance does matter
even so, we still can fsync the last md-file when any wal record
other than FPI for INIT_FORK comes. (But this would be a bit
complex..)

By the way, I suppose that fsyncing only the last page in
successive new pages still theoretically can cause this problem
when the last pages is not in the same file with other
pages. That cannot occur for INIT_FORKNUM files though in
reality:)

Thoughts?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-12-01 02:46:30 Re: proposal: multiple psql option -c
Previous Message Vinayak 2015-12-01 02:10:44 Re: [PROPOSAL] VACUUM Progress Checker.