Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, jianingy(dot)yang(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table
Date: 2019-03-07 11:36:02
Message-ID: CA+HiwqHtGpjSB8KUQdn_Hv3_DfjPqVrnMd5juVQRTDQRak2-Vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Mar 7, 2019 at 11:17 AM Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> The problem start when ALTER TABLE users ALTER COLUMN is executed.
> create table users(user_id int, name varchar(64), unique (user_id, name))
> partition by list(user_id);
>
> create table users_000 partition of users for values in(0);
> create table users_001 partition of users for values in(1);
> select relname, relfilenode from pg_class where relname like 'users%';
> relname │ relfilenode
> ────────────────────────────┼─────────────
> users │ 16441
> users_000 │ 16446
> users_000_user_id_name_key │ 16449
> users_001 │ 16451
> users_001_user_id_name_key │ 16454
> users_user_id_name_key │ 16444
> (6 rows)
>
> alter table users alter column name type varchar(127);
> select relname, relfilenode from pg_class where relname like 'users%';
> relname │ relfilenode
> ────────────────────────────┼─────────────
> users │ 16441
> users_000 │ 16446
> users_000_user_id_name_key │ 16444 <=== duplicated
> users_001 │ 16451
> users_001_user_id_name_key │ 16444 <=== duplicated
> users_user_id_name_key │ 16444 <=== duplicated
> (6 rows)

I checked why users_000's and user_0001's indexes end up reusing
users_user_id_name_key's relfilenode. At the surface, it's because
DefineIndex(<parent's-index-to-be-recreated>) is carrying oldNode =
<old-parents-index's-relfilenode> in IndexStmt, which is recursively
passed down to DefineIndex(<child-indexes-to-be-recreated>). This
DefineIndex() chain is running due to ATPostAlterTypeCleanup() on the
parent rel. This surface problem may be solved in DefineIndex() by
just resetting oldNode in each child IndexStmt before recursing, but
that means child indexes are recreated with new relfilenodes. That
solves the immediate problem of relfilenodes being wrongly duplicated,
that's leading to madness such as SMgrRelationHash corruption being
seen in the original bug report.

But, the root problem seems to be that ATPostAlterTypeCleanup() on
child tables isn't setting up their own
DefineIndex(<child-index-to-be-rewritten>) step. That's because the
parent's ATPostAlterTypeCleanup() dropped child copies of the UNIQUE
constraint due to dependencies (+ CCI). So, ATExecAlterColumnType()
on child relations isn't able to find the constraint on the individual
child relations to turn into their own
DefineIndex(<child-index-to-be-rewritten>). If we manage to handle
each relation's ATPostAlterTypeCleanup() independently, child's
recreated indexes will be able to reuse their old relfilenodes and
everything will be fine. But maybe that will require significant
overhaul of how this post-alter-type-cleanup occurs?

Thanks,
Amit

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-03-07 12:16:01 BUG #15674: Errors
Previous Message Ricardo Teixeira 2019-03-07 10:55:46 Fw: Instalation Bug

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-03-07 11:53:30 Re: Online verification of checksums
Previous Message Filip Rembiałkowski 2019-03-07 11:27:49 Re: Re: proposal: make NOTIFY list de-duplication optional