Re: hash index on unlogged tables doesn't behave as expected

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash index on unlogged tables doesn't behave as expected
Date: 2017-07-06 11:29:59
Message-ID: CAA4eK1K2iAvnei88thQ8jco3uAYg6avkP1WGN4SJ-QxmDkSeqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 6, 2017 at 3:24 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1+SYqCmA7ioTBpJHcO-B-rf8A=N9Gr1-RP3RhwecB5E-A(at)mail(dot)gmail(dot)com>
>> On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> >
>> > It seems to me that this qualifies as an open item for 10. WAL-logging
>> > is new for hash tables. Amit, could you add one?
>> >
>>
>> Added.
>>
>> > + use_wal = RelationNeedsWAL(rel) ||
>> > + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
>> > + forkNum == INIT_FORKNUM);
>> > In access AMs, empty() routines are always only called for unlogged
>> > relations, so you could relax that to check for INIT_FORKNUM only.
>> >
>>
>> Yeah, but _hash_init() is an exposed API, so I am not sure it is a
>> good idea to make such an assumption at that level of code. Now, as
>> there is no current user which wants to use it any other way, we can
>> make such an assumption, but does it serve any purpose?
>
> Check for INIT_FORKNUM appears both accompanied and not
> accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
> is the convention here.
>
> The difference of the two is an init fork of TEMP
> relations. However I believe that init fork is by definition a
> part of an unlogged relation, it seems to me that it ought not to
> be wal-logged if we had it. From this viewpoint, the middle line
> makes sense.
>

What is the middle line? Are you okay with a current check or do you
see any problem with it or do you prefer to write it differently?

>
>
>> > Looking at the patch, I think that you are right to do the logging
>> > directly in _hash_init() and not separately in hashbuildempty().
>> > Compared to other relations the init data is more dynamic.
>> >
>> > + /*
>> > + * Force the on-disk state of init forks to always be in sync with the
>> > + * state in shared buffers. See XLogReadBufferForRedoExtended.
>> > + */
>> > + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL);
>> > + if (forknum == INIT_FORKNUM)
>> > + FlushOneBuffer(metabuf);
>> > I find those forced syncs a bit surprising. The buffer is already
>> > marked as dirty, so even if there is a concurrent checkpoint they
>> > would be flushed.
>> >
>>
>> What if there is no concurrent checkpoint activity and the server is
>> shutdown? Remember this replay happens on standby and we will just
>> try to perform Restartpoint and there is no guarantee that it will
>> flush the buffers. If the buffers are not flushed at shutdown, then
>> the Init fork will be empty on restart and the hash index will be
>> corrupt.
>
> XLogReadBufferForRedoExtended() automatically force the flush for
> a XLOG record on init fork that having FPW imaeges. And
> _hash_init seems to have emitted such a XLOG record using
> log_newpage.
>

Sure, but log_newpage is not used for metapage and bitmappage.

>> > If recovery comes again here, then they would just
>> > be replayed. I am unclear why XLogReadBufferForRedoExtended is not
>> > enough to replay a FPW of that.
>> >
>>
>> Where does FPW come into the picture for a replay of metapage or bitmappage?
>
> Since required FPW seems to be emitted and the flush should be
> done automatically, the issuer side (_hash_init) may attach wrong
> FPW images if hash_xlog_init_meta_page requires additional
> flushes. But I don't see such a flaw there. I think Michael wants
> to say such a kind of thing.
>

I am not able to understand what you want to say, but I think you
don't see any problem with the patch as such.

>
>
> By the way the comment of the function ReadBufferWithoutRelcache
> has the following sentense.
>

I don't think we need anything with respect to this patch because
ReadBufferWithoutRelcache is used in case of FPW replay of INIT_FORK
pages as well. Do you have something else in mind which I am not able
to see?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message AP 2017-07-06 11:34:55 Re: pgsql 10: hash indexes testing
Previous Message Etsuro Fujita 2017-07-06 10:50:33 Re: Add support for tuple routing to foreign partitions