Re: hash index on unlogged tables doesn't behave as expected

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hash index on unlogged tables doesn't behave as expected
Date: 2017-07-10 09:28:13
Message-ID: CAA4eK1JYyO5Hcxx4rP0jb=JmMC4qvY1YvG9UvkwNr5qrojsOPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 10, 2017 at 10:39 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1+-DUto+MyeNdLE9P9u8G3Fv6n+SOjPSqmPSw6ashhPjw(at)mail(dot)gmail(dot)com>
>> On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> > On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> >> I think we should do that as a separate patch (I can write the same as
>> >> well) because that is not new behavior introduced by this patch, ...
>> >
>> > True, although maybe hash indexes are the only way that happens today?
>> >
>>
>> No, I think it will happen for all other indexes as well. Basically,
>> we log new page for init forks of other indexes and then while
>> restoring that full page image, it will fall through the same path.
>
> (Sorry, I didn't noticed that hash metapage is not using log_newpgae)
>

The bitmappage is also not using log_newpage.

> For example, (bt|gin|gist|spgist|brin)buildempty are using
> log_newpage to log INIT_FORK metapages. I believe that the xlog
> flow from log_newpage to XLogReadBufferForRedoExtended is
> developed so that pages in init forks are surely flushed during
> recovery, so we should fix it instead adding other flushes if the
> path doesn't work. Am I looking wrong place? (I think it is
> working.)
>

You are looking at right place.

> If I'm understanding correctly here, hashbild and hashbuildempty
> should be refactored using the same structure with other *build
> and *buildempty functions. Specifically metapages initialization
> subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage
> or...) does only on-memory initialization and does not emit WAL,
> then *build and *buildempty emits WAL in their required way.
>

I have considered this way of doing as well, read my first e-mail [1]
in this thread "Another approach to fix the issue ....". It is not
that we can't change the code of hashbuildempty to make it log new
pages for all kind of pages (metapage, bitmappage and datapages), but
I am not sure if there is a value in going in that direction. If we
have to go in that direction, we need to hard code some of the values
like hashm_nmaps and hashm_mapp in metapage rather than initializing
them after bitmappage creation. Before going in that direction, I
think we should also take opinion from other people especially some
committer as we might need to maintain two different ways of doing
almost the same thing.

I am also not 100% comfortable with adding flush at two new places,
but that makes the code for fix simpler and fundamentally there is no
problem in doing so.

There is a small downside to always logging new page which is that it
will always log full page image in WAL which has the potential to
increase WAL volume if there are many unlogged tables and indexes.
Now considering the init fork generally has very less pages, it is not
a big concern, but still avoiding full page image has some value.

>
>> >>> By the way the comment of the function ReadBufferWithoutRelcache
>> >>> has the following sentense.
>> >>>
>> >>> | * NB: At present, this function may only be used on permanent relations, which
>> >>> | * is OK, because we only use it during XLOG replay. If in the future we
>> >>> | * want to use it on temporary or unlogged relations, we could pass additional
>> >>> | * parameters.
>> >>>
>> >>> and does
>> >>>
>> >>> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
>> >>> mode, strategy, &hit);
>> >>>
>> >>> This surely works since BufferAlloc recognizes INIT_FORK. But
>> >>> some adjustment may be needed around here.
>> >>
>> >> Yeah, it should probably mention that the init fork of an unlogged
>> >> relation is also OK.
>> >>
>> >
>> > I think we should do that as a separate patch (I can write the same as
>> > well) because that is not new behavior introduced by this patch, but
>> > let me know if you think that we should add such a comment in this
>> > patch itself.
>> >
>>
>> Attached a separate patch to adjust the comment atop ReadBufferWithoutRelcache.
>
> + * NB: At present, this function may only be used on permanent relations and
> + * init fork of an unlogged relation, which is OK, because we only use it
> + * during XLOG replay. If in the future we want to use it on temporary or
> + * unlogged relations, we could pass additional parameters.
>
> *I* think the target of the funcion is permanent relations and
> init forks, not unlogged relations. And I'd like to have an
> additional comment about RELPERSISTENCE_PERMANENT. Something like
> the attached.
>

Okay, let's leave this for committer to decide.

[1] - https://www.postgresql.org/message-id/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-07-10 09:29:03 Re: paths in partitions of a dummy partitioned table
Previous Message Amit Kapila 2017-07-10 07:54:20 Re: [WIP] Zipfian distribution in pgbench