Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Michael Paquier <michael(at)paquier(dot)xyz>, jianingy(dot)yang(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table
Date: 2019-03-07 02:17:11
Message-ID: ea06533c-8514-1ec7-112a-2581f03bc070@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2019/03/07 9:03, Michael Paquier wrote:
> On Wed, Mar 06, 2019 at 03:06:53PM +0000, PG Bug reporting form wrote:
>> 1. create a partition table with the following constraints
>> a. with a unique key on partition key and a varchar type field
>> b. using hash partition
>> 2. alter the length of the varchar type field
>> 3. drop the partition table in a transaction
>> 4. crash
>
> I can reproduce the failure easily, not on HEAD but with
> REL_11_STABLE:

Same here. I could reproduce it with 11.0.

> (gdb) bt
> #0 __GI_raise (sig=sig(at)entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:50
> #1 0x00007f585729b535 in __GI_abort () at abort.c:79
> #2 0x000055eef597e60a in errfinish (dummy=0) at elog.c:555
> #3 0x000055eef5980c50 in elog_finish (elevel=22, fmt=0x55eef5a41408
> "cannot abort transaction %u, it was already committed") at
> elog.c:1376
> #4 0x000055eef5479647 in RecordTransactionAbort (isSubXact=false) at
> xact.c:1580
> #5 0x000055eef547a6c0 in AbortTransaction () at xact.c:2602
> #6 0x000055eef547aef4 in AbortCurrentTransaction () at xact.c:3104
>
> That's worth an investigation, SMgrRelationHash is getting messed up
> which causes the transaction commit to fail where it should not.

Looking at what was causing the SMgrRelationHash corruption, it seems
there were entries with same/duplicated relnode value in pendingDeletes list.

The problem start when ALTER TABLE users ALTER COLUMN is executed.
create table users(user_id int, name varchar(64), unique (user_id, name))
partition by list(user_id);

create table users_000 partition of users for values in(0);
create table users_001 partition of users for values in(1);
select relname, relfilenode from pg_class where relname like 'users%';
relname │ relfilenode
────────────────────────────┼─────────────
users │ 16441
users_000 │ 16446
users_000_user_id_name_key │ 16449
users_001 │ 16451
users_001_user_id_name_key │ 16454
users_user_id_name_key │ 16444
(6 rows)

alter table users alter column name type varchar(127);
select relname, relfilenode from pg_class where relname like 'users%';
relname │ relfilenode
────────────────────────────┼─────────────
users │ 16441
users_000 │ 16446
users_000_user_id_name_key │ 16444 <=== duplicated
users_001 │ 16451
users_001_user_id_name_key │ 16444 <=== duplicated
users_user_id_name_key │ 16444 <=== duplicated
(6 rows)

Ran out off time...

Thanks,
Amit

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Julien Rouhaud 2019-03-07 10:51:34 Re: BUG #15669: Error with unnest in PG 11 (ERROR: 0A000)
Previous Message Michael Paquier 2019-03-07 00:03:16 Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-07 02:25:10 Re: few more wait events to add to docs
Previous Message Michael Paquier 2019-03-07 02:16:41 Re: Online verification of checksums