Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

From: Pawel Kudzia <kudzia(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date: 2021-07-12 17:24:29
Message-ID: CAJYBUS8S-k7r-VQ3Vv7Asz9KiW1u50VT5k7fq5BSgVXJ_W4KYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Jul 11, 2021 at 8:42 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 05/07/2021 16:36, Pawel Kudzia wrote:
> > i have a consistent file-level backup of postgresql's /var/lib/postgresql +
> > /etc/postgresql on which i can reproduce the issue reliably. it's on a test
> > machine where we can put patched version of PG. currently this machine
> > is using Debian 13.3-1.pgdg100+1.
> >
> > set enable_seqscan=off;
> > SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1737}' )
> > AND NOT ( (attribute_name_ids||0) && '{1737}') LIMIT 10;
> >
> > returns me 2 rows while it should return none.
> >
> > also in this case lowering work_mem from 1MB to 256kB makes fixes the
> > issue - SELECT returns 0 rows instead of 2.
> >
> > i'll be happy to run patched version and send you back logs produced by it.
>
> Thanks! Here's a patch that prints lines with "GINBUG:" prefix in the
> log. It should apply cleanly to PostgreSQL v13.
>
> This should help to confirm whether the bugs in shimTriConsistentFn that
> Tom pointed out are the root cause of this issue. It should also tell us
> whether the rows are being returned from the pending-inserts list or the
> regular part of the GIN index.
>
> We might need a few iterations to test different theories, but we'll get
> there...
>

Heikki - thank you for the patch! it applied cleanly.

i've run these two commands:

SET enable_seqscan=off;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1737}' )
AND NOT ( (attribute_name_ids||0) && '{1737}') LIMIT 10;

and got this in the output:

2021-07-12 07:41:05 UTC LOG: GINBUG: startScanKey called: excludeOnly
0 nentries 1

2021-07-12 07:41:05 UTC STATEMENT: SELECT entity_id FROM entity WHERE
( attribute_name_ids && '{1737}' ) AND NOT ( (attribute_name_ids||0)
&& '{1737}') LIMIT 10;
2021-07-12 07:41:05 UTC LOG: GINBUG: called triConsistentFn(1): 1 0

the "STATEMENT" line is repeated 79586 times, the "GINBUG: called
triConsistentFn" - 79585 times.

there's nothing else in the log besides information about server startup.

i'll be happy to help with the follow up checks.

> Ps. Sorry for the delay, I didn't see you reply until now, it went into
> spam folder

no worries!

--
regards,
Pawel Kudzia

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-07-12 18:16:45 BUG #17102: Running "create or replace language plperl" gives error
Previous Message Tom Lane 2021-07-12 17:20:56 printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)