Re: [BUG] Error in BRIN summarization

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [BUG] Error in BRIN summarization
Date: 2020-07-27 15:21:06
Message-ID: 4e07c304-76c8-2645-d4a4-e5706072b7d7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.07.2020 20:39, Anastasia Lubennikova wrote:
> One of our clients caught an error "failed to find parent tuple for
> heap-only tuple at (50661,130) in table "tbl'" in PostgreSQL v12.
>
> Steps to reproduce (REL_12_STABLE):
>
> 1) Create table with primary key, create brin index, fill table with
> some initial data:
>
> create table tbl (id int primary key, a int) with (fillfactor=50);
> create index idx on tbl using brin (a) with (autosummarize=on);
> insert into tbl select i, i from generate_series(0,100000) as i;
>
> 2) Run script test_brin.sql using pgbench:
>
>  pgbench postgres -f ../review/brin_test.sql  -n -T 120
>
> The script is a bit messy because I was trying to reproduce a
> problematic workload. Though I didn't manage to simplify it.
> The idea is that it inserts new values into the table to produce
> unindexed pages and also updates some values to trigger HOT-updates on
> these pages.
>
> 3) Open psql session and run brin_summarize_new_values
>
> select brin_summarize_new_values('idx'::regclass::oid); \watch 2
>
> Wait a bit. And in psql you will see the ERROR.
>
> This error is caused by the problem with root_offsets array bounds. It
> occurs if a new HOT tuple was inserted after we've collected
> root_offsets, and thus we don't have root_offset for tuple's offnum.
> Concurrent insertions are possible, because
> brin_summarize_new_values() only holds ShareUpdateLock on table and no
> lock (only pin) on the page.
>
> The draft fix is in the attachments. It saves root_offsets_size and
> checks that we only access valid fields.
> Patch also adds some debug messages, just to ensure that problem was
> caught.
>
> TODO:
>
> - check if  heapam_index_validate_scan() has the same problem
> - code cleanup
> - test other PostgreSQL versions
>
> [1]
> https://www.postgresql.org/message-id/flat/CA%2BTgmoYgwjmmjK24Qxb_vWAu8_Hh7gfVFcr3%2BR7ocdLvYOWJXg%40mail.gmail.com
>

Here is the updated version of the fix.
The problem can be reproduced on all supported versions, so I suggest to
backpatch it.
Code slightly changed in v12, so here are two patches: one for versions
9.5 to 11 and another for versions from 12 to master.

As for heapam_index_validate_scan(), I've tried to reproduce the same
error with CREATE INDEX CONCURRENTLY, but haven't found any problem with it.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
brin_summarize_fix_REL_12_v1.patch text/x-patch 4.6 KB
brin_summarize_fix_REL9_5_v1.patch text/x-patch 4.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Schneider 2020-07-27 15:28:07 Re: Display individual query in pg_stat_activity
Previous Message Dave Page 2020-07-27 14:57:10 Re: Display individual query in pg_stat_activity