Re: Failure in contrib test _int on loach

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Failure in contrib test _int on loach
Date: 2019-04-11 09:29:03
Message-ID: 89f1015d-a6c3-e341-235b-b60de2343fe6@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/04/2019 13:14, Heikki Linnakangas wrote:
> On 11/04/2019 09:10, Andrey Lepikhov wrote:
>> On 10/04/2019 20:25, Heikki Linnakangas wrote:
>>> On 09/04/2019 19:11, Anastasia Lubennikova wrote:
>>>> After introducing GistBuildNSN this code path became unreachable.
>>>> To fix it, I added new flag to detect such splits during indexbuild.
>>>
>>> Isn't it possible that the grandparent page is also split, so that we'd
>>> need to climb further up?
>>
>> Based on Anastasia's idea i prepare alternative solution to fix the bug
>> (see attachment).
>> It utilizes the idea of linear increment of LSN/NSN. WAL write process
>> is used for change NSN value to 1 for each block of index relation.
>> I hope this can be a fairly clear and safe solution.
>
> That's basically the same idea as always using the "fake LSN" during
> index build, like the original version of this patch did. It's got the
> problem that I mentioned at
> https://www.postgresql.org/message-id/090fb3cb-1ca4-e173-ecf7-47d41ebac620@iki.fi:
>
>
>> * Using "fake" unlogged LSNs for GiST index build seemed fishy. I
>> could not convince myself that it was safe in all corner cases. In a
>> recently initdb'd cluster, it's theoretically possible that the fake
>> LSN counter overtakes the real LSN value, and that could lead to
>> strange behavior. For example, how would the buffer manager behave, if
>> there was a dirty page in the buffer cache with an LSN value that's
>> greater than the current WAL flush pointer? I think you'd get "ERROR:
>> xlog flush request %X/%X is not satisfied --- flushed only to %X/%X".
>
> Perhaps the risk is theoretical; the real WAL begins at XLOG_SEG_SIZE,
> so with defaults WAL segment size, the index build would have to do
> about 16 million page splits. The index would have to be at least 150 GB
> for that. But it seems possible, and with non-default segment and page
> size settings more so.
As i see in bufmgr.c, XLogFlush() can't called during index build. In
the log_newpage_range() call we can use mask to set value of NSN (and
LSN) to 1.
>
> Perhaps we could start at 1, but instead of using a global counter,
> whenever a page is split, we take the parent's LSN value and increment
> it by one. So different branches of the tree could use the same values,
> which would reduce the consumption of the counter values.
>
> Yet another idea would be to start the counter at 1, but check that it
> doesn't overtake the WAL insert pointer. If it's about to overtake it,
> just generate some dummy WAL.
>
> But it seems best to deal with this in gistdoinsert(). I think
> Anastasia's approach of adding a flag to GISTInsertStack can be made to
> work, if we set the flag somewhere in gistinserttuples() or
> gistplacetopage(), whenever a page is split. That way, if it needs to
> split multiple levels, the flag is set on all of the corresponding
> GISTInsertStack entries.
>
> Yet another trivial fix would be just always start the tree descend from
> the root in gistdoinsert(), if a page is split. Not as efficient, but
> probably negligible in practice.
Agree

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2019-04-11 09:36:22 Commit message / hash in commitfest page.
Previous Message Chris Travers 2019-04-11 09:25:29 Re: Berserk Autovacuum (let's save next Mandrill)