Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

From: Pawel Kudzia <kudzia(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date: 2021-07-22 09:07:16
Message-ID: CAJYBUS-K1-NKg7vZ-05vu=9-OjVmS2v360Z+XtHiYRbDDe05jg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jul 22, 2021 at 9:25 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 20/07/2021 18:58, Pawel Kudzia wrote:
> > On Tue, Jul 20, 2021 at 5:34 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >> Pawel: Can you test once again with the attached amcheck version?
> >
> > patch v4-0001-Amcheck-for-GIN-13stable.patch applied to
> > postgresql-13_13.3.orig.tar.bz2
> > attached - log and result of the check.
> >
> > executing "select
> > gin_index_parent_check('entity_attribute_name_ids_gin');" took
> > considerably longer this time and it ended with :
> >
> > [524767.920498] postgres[29716]: segfault at fffffffffffffffe ip
> > 00007f214c20a2d8 sp 00007ffe2a5fd040 error 5 in
> > amcheck.so[7f214c209000+6000]
> > [524767.920521] Code: 00 00 41 0f b7 4e 04 41 8b 56 0c 41 8b 76 10 49
> > 89 c1 48 8d 04 40 c1 e1 10 45 0f b7 46 08 48 8d 7c 43 fa 41 0f b7 46
> > 06 09 c1 <0f> b7 47 04 50 0f b7 07 0f b7 7f 02 c1 e0 10 09 f8 50 0f b7
> > 43 04
>
> Hmph, another set of bugs in the amcheck patch. It didn't handle empty
> entry tree items, nor empty posting tree pages. And there was another
> bug, which caused it to scan entry tree pages multiple times; that's why
> the same WARNING was printed multiple times.
>
> Fixed those bugs, new patch version attached. Pawel, can you test this
> again, please? At this point, I'm pretty sure this isn't going to reveal
> any more information about the original problem, but at least we're
> ironing out bugs from the 'amcheck' patch..

thank you for the next patch Heikki!
no crash this time! i'm sending a message in a separate mail since i'm
not sure if it'll pass through attachment size limit that's applied
for the list.

> I'm grasping at straws here, but here's one more thing we could try: the
> query returned these incorrect tuples:
>
> ctid | entity_id
> --------------+-----------
> (4002784,1) | 38048120
> (4002869,14) | 95333744
> (2 rows)
>
> We know those entries are in the GIN index with key '1373', when they
> shouldn't be. But I wonder if the correct keys for those tuples are
> present? Pawel, can you try this, please:
>
> -- persuade planner to use a bitmap index scan
> set enable_seqscan=off;
> set enable_tidscan=off;
> -- avoid lossy bitmaps
> set work_mem='1GB';
>
> -- Explain analyze first to check it's using a bitmap index scan and
> -- no lossy pages.
> explain analyze
> SELECT ctid, entity_id FROM entity WHERE attribute_name_ids &&
> '{38048120}' and ctid='(4002784,1)';
> SELECT ctid, entity_id FROM entity WHERE attribute_name_ids &&
> '{38048120}' and ctid='(4002784,1)';
>
> explain analyze
> SELECT ctid, entity_id FROM entity WHERE attribute_name_ids &&
> '{95333744}' and ctid='(4002869,14)';
> SELECT ctid, entity_id FROM entity WHERE attribute_name_ids &&
> '{95333744}' and ctid='(4002869,14)';

data=# explain analyze SELECT ctid, entity_id FROM entity WHERE
attribute_name_ids && '{38048120}' and ctid='(4002784,1)';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on entity (cost=72.70..3366.69 rows=1 width=14)
(actual time=42.995..42.998 rows=0 loops=1)
Recheck Cond: (attribute_name_ids && '{38048120}'::integer[])
Filter: (ctid = '(4002784,1)'::tid)
-> Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..72.70 rows=33055 width=0) (actual time=42.989..42.990
rows=0 loops=1)
Index Cond: (attribute_name_ids && '{38048120}'::integer[])
Planning Time: 18.912 ms
Execution Time: 43.515 ms
(7 rows)

data=# SELECT ctid, entity_id FROM entity WHERE attribute_name_ids &&
'{38048120}' and ctid='(4002784,1)';
ctid | entity_id
------+-----------
(0 rows)

data=# explain analyze SELECT ctid, entity_id FROM entity WHERE
attribute_name_ids && '{95333744}' and ctid='(4002869,14)';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on entity (cost=72.70..3366.69 rows=1 width=14)
(actual time=2.552..2.554 rows=0 loops=1)
Recheck Cond: (attribute_name_ids && '{95333744}'::integer[])
Filter: (ctid = '(4002869,14)'::tid)
-> Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..72.70 rows=33055 width=0) (actual time=2.543..2.544 rows=0
loops=1)
Index Cond: (attribute_name_ids && '{95333744}'::integer[])
Planning Time: 0.193 ms
Execution Time: 2.594 ms
(7 rows)

data=# SELECT ctid, entity_id FROM entity WHERE attribute_name_ids &&
'{95333744}' and ctid='(4002869,14)';
ctid | entity_id
------+-----------
(0 rows)

>
> - Heikki

--
regards,
Pawel Kudzia

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pawel Kudzia 2021-07-22 09:08:28 Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Previous Message Artur Zakirov 2021-07-22 09:04:07 Re: BUG #15293: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events