Re: Failure in contrib test _int on loach

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Failure in contrib test _int on loach
Date: 2019-04-11 08:14:11
Message-ID: fb0bd75c-d925-a9f5-75cc-6b387bb31ebc@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/04/2019 09:10, Andrey Lepikhov wrote:
> On 10/04/2019 20:25, Heikki Linnakangas wrote:
>> On 09/04/2019 19:11, Anastasia Lubennikova wrote:
>>> After introducing GistBuildNSN this code path became unreachable.
>>> To fix it, I added new flag to detect such splits during indexbuild.
>>
>> Isn't it possible that the grandparent page is also split, so that we'd
>> need to climb further up?
>
> Based on Anastasia's idea i prepare alternative solution to fix the bug
> (see attachment).
> It utilizes the idea of linear increment of LSN/NSN. WAL write process
> is used for change NSN value to 1 for each block of index relation.
> I hope this can be a fairly clear and safe solution.

That's basically the same idea as always using the "fake LSN" during
index build, like the original version of this patch did. It's got the
problem that I mentioned at
https://www.postgresql.org/message-id/090fb3cb-1ca4-e173-ecf7-47d41ebac620@iki.fi:

> * Using "fake" unlogged LSNs for GiST index build seemed fishy. I could
> not convince myself that it was safe in all corner cases. In a recently
> initdb'd cluster, it's theoretically possible that the fake LSN counter
> overtakes the real LSN value, and that could lead to strange behavior.
> For example, how would the buffer manager behave, if there was a dirty
> page in the buffer cache with an LSN value that's greater than the
> current WAL flush pointer? I think you'd get "ERROR: xlog flush request
> %X/%X is not satisfied --- flushed only to %X/%X".

Perhaps the risk is theoretical; the real WAL begins at XLOG_SEG_SIZE,
so with defaults WAL segment size, the index build would have to do
about 16 million page splits. The index would have to be at least 150 GB
for that. But it seems possible, and with non-default segment and page
size settings more so.

Perhaps we could start at 1, but instead of using a global counter,
whenever a page is split, we take the parent's LSN value and increment
it by one. So different branches of the tree could use the same values,
which would reduce the consumption of the counter values.

Yet another idea would be to start the counter at 1, but check that it
doesn't overtake the WAL insert pointer. If it's about to overtake it,
just generate some dummy WAL.

But it seems best to deal with this in gistdoinsert(). I think
Anastasia's approach of adding a flag to GISTInsertStack can be made to
work, if we set the flag somewhere in gistinserttuples() or
gistplacetopage(), whenever a page is split. That way, if it needs to
split multiple levels, the flag is set on all of the corresponding
GISTInsertStack entries.

Yet another trivial fix would be just always start the tree descend from
the root in gistdoinsert(), if a page is split. Not as efficient, but
probably negligible in practice.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-04-11 08:33:13 Re: pg_rewind vs superuser
Previous Message tushar 2019-04-11 07:52:32 Re: Minimal logical decoding on standbys