Re: Error with index on unlogged table

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, andres(at)2ndquadrant(dot)com, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Error with index on unlogged table
Date: 2015-12-01 02:53:35
Message-ID: CAB7nPqSMENEK7nqmwGsiyLTSbrZNJKx80tBX3qF6cQsS49sjag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 1, 2015 at 11:11 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, I studied your latest patch.

Thanks!

> I feel quite uncomfortable that it solves the problem from a kind
> of nature of unlogged object by arbitrary flagging which is not
> fully corresponds to the nature. If we can deduce the necessity
> of fsync from some nature, it would be preferable.

INIT_FORKNUM is not something only related to unlogged relations,
indexes use them as well. And that's actually
If you look at for example BRIN indexes that do not sync immediately
their INIT_FORKNUM when index is created, I think that we still are
going to need a new flag to control the sync at WAL replay because
startup process cannot know a relation's persistence, thing that we
can know when the XLOG_FPI record is created. For BRIN indexes, we
want particularly to not sync the INIT_FORKNUM when the relation is
not an unlogged one.

> In the current patch, is_sync for log_newpage is generally true
> for and only for INIT_FORKNUM pages. Exceptions as far as I can
> see are,

Yep, that's not always the case.

> copy_relation_data: called with arbitrary forknum but it doesn't
> set is_fsync even for coying INIT_FORKNUM. (Is this not a
> problem?)

Oh. And actually, it seems that use_wal is broken there as well. I
think that we would still want to issue a XLOG_FPI record for an
unlogged relation should it be moved to a new tablespace to copy its
INIT_FORKNUM correctly to its new home.
After moving an unlogged relation to a new tablespace, and after
promoting a standby the standby fails to start because of this error:
FATAL: could not create file
"pg_tblspc/16388/PG_9.6_201511071/16384/16389": File exists
This could be fixed separately though.

> spgbuildempty, ginbuildempty: these emits two or three newpage
> logs at once so only the last one is set is_fsync for
> performance reason.

It doesn't matter to fsync just at the last one. Each one of them
would be replayed either way, the last one triggering the sync, no?

> In short, it seems to me that the reason to choose using
> XLOG_FPI_FOR_SYNC here is only performance of processing
> successive FPIs for INIT_FORKNUM.

Yeah, there is a one-way link between this WAL record a INIT_FORKNUM.
However please note that having a INIT_FORKNUM does not always imply
that a sync is wanted. copy_relation_data is an example of that.

> INIT_FORKNUM is generated only for unlogged tables and their
> belongings. I suppose such successive fsyncs doesn't cause
> observable performance drop assuming that the number of unlogged
> tables and belongings is not so high, especially with smarter
> storages. All we should do is that just fsync only for
> INIT_FORKNUM's FPIs for the case. If the performance does matter
> even so, we still can fsync the last md-file when any wal record
> other than FPI for INIT_FORK comes. (But this would be a bit
> complex..)

Hm. If a system uses a bunch of temporary relations with brin index or
other included I would not say so. For back branches we may have to do
it unconditionally using INIT_FORKNUM, but having a control flag to
have it only done for unlogged relations would leverage that.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-12-01 04:00:19 Removing Functionally Dependent GROUP BY Columns
Previous Message Pavel Stehule 2015-12-01 02:46:30 Re: proposal: multiple psql option -c