Re: Error with index on unlogged table

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, andres(at)2ndquadrant(dot)com, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Error with index on unlogged table
Date: 2015-12-01 13:27:09
Message-ID: CAB7nPqSF4anbrRn+jkaZBmgJkqAvrySq2nmjFLWeYbt-RZbE8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 1, 2015 at 3:06 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Tue, 1 Dec 2015 11:53:35 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqSMENEK7nqmwGsiyLTSbrZNJKx80tBX3qF6cQsS49sjag(at)mail(dot)gmail(dot)com>
>> On Tue, Dec 1, 2015 at 11:11 AM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Hello, I studied your latest patch.
>>
>> Thanks!
>>
>> > I feel quite uncomfortable that it solves the problem from a kind
>> > of nature of unlogged object by arbitrary flagging which is not
>> > fully corresponds to the nature. If we can deduce the necessity
>> > of fsync from some nature, it would be preferable.
>>
>> INIT_FORKNUM is not something only related to unlogged relations,
>> indexes use them as well. And that's actually
>> If you look at for example BRIN indexes that do not sync immediately
>> their INIT_FORKNUM when index is created, I think that we still are
>> going to need a new flag to control the sync at WAL replay because
>> startup process cannot know a relation's persistence, thing that we
>> can know when the XLOG_FPI record is created. For BRIN indexes, we
>> want particularly to not sync the INIT_FORKNUM when the relation is
>> not an unlogged one.
>
> (The comment added in brinbuildempty looks wrong since it
> actually doesn't fsync it immediately..)
>
> Hmm, I've already seen that, and having your explanation I wonder
> why brinbuidempty issues WAL for what is not necessary to be
> persistent at the mement. Isn't it breaking agreements about
> Write Ahead Log? INIT_FORKNUM and unconditionally fsync'ing would
> be equally tied excluding the anormally about WAL. (Except for
> succeeding newpages.)

Alvaro, your thoughts regarding those lines? When building an empty
INIT_FORKNUM for a brin index its data is saved into a shared buffer
and not immediately synced into disk. Shouldn't that be necessary for
at least unlogged relations?

>> > In short, it seems to me that the reason to choose using
>> > XLOG_FPI_FOR_SYNC here is only performance of processing
>> > successive FPIs for INIT_FORKNUM.
>>
>> Yeah, there is a one-way link between this WAL record a INIT_FORKNUM.
>> However please note that having a INIT_FORKNUM does not always imply
>> that a sync is wanted. copy_relation_data is an example of that.
>
> As I wrote above, I suppose we should fix(?) the irregular
> relationship between WAL and init fork of brin and so.

Yep.

>> > INIT_FORKNUM is generated only for unlogged tables and their
>> > belongings. I suppose such successive fsyncs doesn't cause
>> > observable performance drop assuming that the number of unlogged
>> > tables and belongings is not so high, especially with smarter
>> > storages. All we should do is that just fsync only for
>> > INIT_FORKNUM's FPIs for the case. If the performance does matter
>> > even so, we still can fsync the last md-file when any wal record
>> > other than FPI for INIT_FORK comes. (But this would be a bit
>> > complex..)
>>
>> Hm. If a system uses a bunch of temporary relations with brin index or
>> other included I would not say so. For back branches we may have to do
>> it unconditionally using INIT_FORKNUM, but having a control flag to
>> have it only done for unlogged relations would leverage that.
>
> It could, and should do so. And if we take such systems with
> bunch of temp relations as significant (I agree with this),
> XLogRegisterBlock() looks to be able to register multiple blocks
> into single wal record and we could eliminate arbitrary flagging
> on individual FPI records using it. Is it possible?

I thought about using a BKPBLOCK flag but all of them are already
taken if that's what you meant. it seems cheaper to do that a record
level...
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2015-12-01 13:54:37 Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Previous Message Michael Paquier 2015-12-01 13:09:53 Re: Improving test coverage of extensions with pg_dump