From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, jianingy(dot)yang(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table |
Date: | 2019-03-27 02:40:12 |
Message-ID: | d51e28a8-a725-b7ef-a6cf-8186e775b852@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On 2019/03/08 19:22, Amit Langote wrote:
> On 2019/03/07 20:36, Amit Langote wrote:
>> On Thu, Mar 7, 2019 at 11:17 AM Amit Langote
>> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> The problem start when ALTER TABLE users ALTER COLUMN is executed.
>>> create table users(user_id int, name varchar(64), unique (user_id, name))
>>> partition by list(user_id);
>>>
>>> create table users_000 partition of users for values in(0);
>>> create table users_001 partition of users for values in(1);
>>> select relname, relfilenode from pg_class where relname like 'users%';
>>> relname │ relfilenode
>>> ────────────────────────────┼─────────────
>>> users │ 16441
>>> users_000 │ 16446
>>> users_000_user_id_name_key │ 16449
>>> users_001 │ 16451
>>> users_001_user_id_name_key │ 16454
>>> users_user_id_name_key │ 16444
>>> (6 rows)
>>>
>>> alter table users alter column name type varchar(127);
>>> select relname, relfilenode from pg_class where relname like 'users%';
>>> relname │ relfilenode
>>> ────────────────────────────┼─────────────
>>> users │ 16441
>>> users_000 │ 16446
>>> users_000_user_id_name_key │ 16444 <=== duplicated
>>> users_001 │ 16451
>>> users_001_user_id_name_key │ 16444 <=== duplicated
>>> users_user_id_name_key │ 16444 <=== duplicated
>>> (6 rows)
>>
>> I checked why users_000's and user_0001's indexes end up reusing
>> users_user_id_name_key's relfilenode. At the surface, it's because
>> DefineIndex(<parent's-index-to-be-recreated>) is carrying oldNode =
>> <old-parents-index's-relfilenode> in IndexStmt, which is recursively
>> passed down to DefineIndex(<child-indexes-to-be-recreated>). This
>> DefineIndex() chain is running due to ATPostAlterTypeCleanup() on the
>> parent rel. This surface problem may be solved in DefineIndex() by
>> just resetting oldNode in each child IndexStmt before recursing, but
>> that means child indexes are recreated with new relfilenodes. That
>> solves the immediate problem of relfilenodes being wrongly duplicated,
>> that's leading to madness such as SMgrRelationHash corruption being
>> seen in the original bug report.
>
> This doesn't happen in HEAD, because in HEAD we got 807ae415c5, which
> changed things so that partitioned relations always have their relfilenode
> set to 0. So, there's no question of parent's relfilenode being passed to
> children and hence being duplicated.
>
> But that also means child indexes are unnecessarily rewritten, that is,
> with new relfilenodes.
>
>> But, the root problem seems to be that ATPostAlterTypeCleanup() on
>> child tables isn't setting up their own
>> DefineIndex(<child-index-to-be-rewritten>) step. That's because the
>> parent's ATPostAlterTypeCleanup() dropped child copies of the UNIQUE
>> constraint due to dependencies (+ CCI). So, ATExecAlterColumnType()
>> on child relations isn't able to find the constraint on the individual
>> child relations to turn into their own
>> DefineIndex(<child-index-to-be-rewritten>). If we manage to handle
>> each relation's ATPostAlterTypeCleanup() independently, child's
>> recreated indexes will be able to reuse their old relfilenodes and
>> everything will be fine. But maybe that will require significant
>> overhaul of how this post-alter-type-cleanup occurs?
>
> We could try to solve this dividing ATPostAlterTypeCleanup processing into
> two functions:
>
> 1 The first one runs right after ATExecAlterColumnType() is finished for a
> given table (like it does today), which then runs ATPostAlterTypeParse to
> generate commands for constraints and/or indexes to re-add. This part
> won't drop the old constraints/indexes just yet, so child
> constraints/indexes will remain for ATExecAlterColumnType to see when
> executed for the children.
>
> 2. Dropping the old constraints/indexes is the responsibility of the 2nd
> part, which runs just before executing ATExecReAddIndex or
> ATExecAddConstraint (with is_readd = true), so that the new constraints
> don't collide with the existing ones.
>
> This arrangement allows step 1 to generate the commands to recreate even
> the child indexes such that the old relfilenode can be be preserved by
> setting IndexStmt.oldNode.
>
> Attached patch is a very rough sketch, which fails some regression tests,
> but I ran out of time today.
I'm thinking of adding this to open items under Older Bugs. Attached the
patch that I had posted on -bugs, but it's only a rough sketch as
described above, not a full fix.
Link to the original bug report:
https://www.postgresql.org/message-id/flat/15672-b9fa7db32698269f%40postgresql.org
Thanks,
Amit
Attachment | Content-Type | Size |
---|---|---|
wip-post-alter-type-cleanup-divide.patch | text/plain | 5.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-03-27 02:56:20 | Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table |
Previous Message | Tom Lane | 2019-03-26 22:53:56 | Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-03-27 02:56:20 | Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table |
Previous Message | Michael Paquier | 2019-03-27 02:36:50 | Re: Shouldn't current_schema() be at least PARALLEL RESTRICTED? |