Re: Error with index on unlogged table

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Error with index on unlogged table
Date: 2015-12-11 07:54:45
Message-ID: CAB7nPqT=7R8F1W-ssAoYdkJkdL-QxGSTL7P3Sf4WVKOENyUBvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 11, 2015 at 4:27 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2015-12-10 18:36:32 +0100, Andres Freund wrote:
>> On 2015-12-10 12:19:12 -0500, Robert Haas wrote:
>> > > The real problem there imo isn't that the copy_relation_data() doesn't
>> > > deal with 0 block tables, but that ATExecSetTableSpace() doesn't have a
>> > > unlogged table specific codepath like heap_create_with_catalog() has.
>> >
>> > It looks to me like somewhere we need to do log_smgrcreate(...,
>> > INIT_FORKNUM) in the unlogged table case.
>>
>> Yes.
>>
>> > RelationCreateStorage()
>> > skips this for the main forknum of an unlogged table, which seems OK,
>> > but there's nothing that even attempts it for the init fork, which
>> > does not seem OK.
>>
>> We unfortunately can't trivially delegate that work to
>> RelationCreateStorage(). E.g. heap_create() documents that only the main
>> fork is created :(
>>
>> > I guess that logic should happen in
>> > ATExecSetTableSpace just after smgrcreate(dstrel, forkNum, false).
>>
>> Looks like it's the easiest place.
>
>> > > A second problem is that the smgrimmedsync() in copy_relation_data()
>> > > isn't called for the init fork of unlogged relations, even if it needs
>> > > to.
>
> Here's a patch doing that. It's not yet fully polished, but I wanted to
> get it out, because I noticed one thing:
>
> In ATExecSetTableSpace(), for !main forks, we currently call
> smgrcreate(), but not log_smgrcreate(). Even for PERSISTENT
> relations. That seems a bit odd to me. It currently seems to be without
> further consequence because, if there's actual data in the fork, we'll
> just create the relation in _mdfd_getseg(); or we can cope with the
> relation not being there. But to me that feels wrong.
>
> It seems better to do the log_smgrcreate() for RELPERSISTENCE_PERMANENT,
> not just INIT_FORKNUM. What do you guys think?

This fixes the problem in my environment.

+ if (rel->rd_rel->relpersistence ==
RELPERSISTENCE_PERMANENT ||
+ (rel->rd_rel->relpersistence ==
RELPERSISTENCE_UNLOGGED &&
+ forkNum == INIT_FORKNUM))
+ log_smgrcreate(&newrnode, forkNum);
There should be a XLogIsNeeded() check as well. Removing the check on
RELPERSISTENCE_UNLOGGED is fine as well... Not mandatory though :)

+ * The init fork for an unlogged relation in many respects has to be
+ * treated the same as normal relation, changes need to be WAL
logged and
+ * it needs to be synced to disk.
+ */
+ copying_initfork = relpersistence == RELPERSISTENCE_UNLOGGED &&
+ forkNum == INIT_FORKNUM;
Here as well just a check on INIT_FORKNUM would be fine.

>> It sounds worthwhile to check that other locations rewriting tables,
>> e.g. cluster/vacuum full/reindex are safe.
>
> Seems to be ok, on a first glance.

Yeah. REINDEX relies on index_build to recreate what it should... The
others are looking fine as well. I have tested it in case and the
files produced are consistent on standby and its master.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-12-11 08:12:43 Re: Making tab-complete.c easier to maintain
Previous Message Dean Rasheed 2015-12-11 07:47:22 Re: Inaccurate results from numeric ln(), log(), exp() and pow()