Re: Failure in contrib test _int on loach

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, a(dot)lepikhov(at)postgrespro(dot)ru, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure in contrib test _int on loach
Date: 2019-04-05 16:41:19
Message-ID: 00873b28-8d7e-72ef-bb8f-0a7f5dfc64b4@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


05.04.2019 18:01, Tom Lane writes:
> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
>> On Fri, Apr 5, 2019 at 2:02 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>>> This is a strange failure:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2019-04-05%2005%3A15%3A00
>>> [ wrong answers from queries using a GIST index ]
>> There are a couple of other recent instances of this failure, on
>> francolin and whelk.
> Yeah. Given three failures in a couple of days, we can reasonably
> guess that the problem was introduced within a day or two prior to
> the first one. Looking at what's touched GIST in that time frame,
> suspicion has to fall heavily on 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b.
>
> If I had to bet, I'd bet that there's something wrong with the
> machinations described in the commit message:
>
> For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
> be marked with a valid (i.e. non-zero) LSN, so that the parent-child
> LSN-NSN interlock works correctly. We now use magic value 1 for that during
> index build. Change the fake LSN counter to begin from 1000, so that 1 is
> safely smaller than any real or fake LSN. 2 would've been enough for our
> purposes, but let's reserve a bigger range, in case we need more special
> values in the future.
>
> I'll go add this as an open issue.
>
> regards, tom lane
>

Hi,
I've already noticed the same failure in our company buildfarm and
started the research.

You are right, it's the " Generate less WAL during GiST, GIN and SP-GiST
index build. " patch to blame.
Because of using the GistBuildLSN some pages are not linked correctly,
so index scan cannot find some entries, while seqscan finds them.

In attachment, you can find patch with a test that allows to reproduce
the bug not randomly, but on every run.
Now I'm trying to find a way to fix the issue.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
gist_optimized_wal_intarray_test.patch text/x-patch 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-04-05 16:51:01 Re: Pluggable Storage - Andres's take
Previous Message Alexis Andrieu 2019-04-05 16:26:16 Small typo fix on tableam documentation