Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-07 18:00:00
Message-ID: 08c2445c-c3c9-ba45-18d3-6399707d8306@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

31.10.2021 22:20, Dmitry Dolgov wrote:
>>
>> I suspect this is the same bug as #17245. Could you check if it's fixed by
>> https://www.postgresql.org/message-id/CAH2-WzkN5aESSLfK7-yrYgsXxYUi__VzG4XpZFwXm98LUtoWuQ%40mail.gmail.com
>>
>> The crash is somewhere in pg_class, which is also manually VACUUMed by the
>> test, which could trigger the issue we found in the other thread. The likely
>> reason the loop in the repro is needed is that that'll push one of the indexes
>> on pg_class over the 512kb/min_parallel_index_scan_size boundary to start
>> using paralell vacuum.
> I've applied both patches from Peter, the fix itself and
> index-points-to-LP_UNUSED-item assertions. Now it doesn't crash on
> pg_unreachable, but hits those extra assertions in the second patch:
Yes, the committed fix for the bug #17245 doesn't help here.
I've also noticed that the server crash is not the only possible
outcome. You can also get unexpected errors like:
ERROR:  relation "errtst_parent" already exists
ERROR:  relation "tmp_idx1" already exists
ERROR:  relation "errtst_child_plaindef" already exists
or
ERROR:  could not open relation with OID 1033921
STATEMENT:  DROP TABLE errtst_parent;
in the server.log (and no crash).
These strange errors and the crash inside index_delete_sort_cmp() can be
seen starting from the commit dc7420c2.
On the previous commit (b8443eae) the reproducing script completes
without a crash or errors (triple-checked).
Probably, the bug #17257 has the same root cause, but the patch [1]
applied to REL_14_STABLE (b0f6bd48) doesn't prevent the crash.
Initially I've thought that the infinite loop in vacuum is a problem
itself, so I decided to separate that one, but maybe both bugs are too
related to be discussed apart.

Best regards,
Alexander

[1]
https://www.postgresql.org/message-id/CAEze2Wj7O5tnM_U151Baxr5ObTJafwH%3D71_JEmgJV%2B6eBgjL7g%40mail.gmail.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Justin Pryzby 2021-11-07 19:22:00 Re: pg_upgrade test for binary compatibility of core data types
Previous Message Semab Tariq 2021-11-07 15:25:09 Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data