Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

From: Hao Wu <hawu(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Date: 2020-08-31 07:00:19
Message-ID: MWHPR0501MB3913B107E322843EF340E8FBA4510@MWHPR0501MB3913.namprd05.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hacker,

I tested the patch provided by Alvaro. It seems not well in REPEATABLE READ.

gpadmin=*# \d+ tpart
Partitioned table "public.tpart"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
i | integer | | | | plain | |
j | integer | | | | plain | |
Partition key: RANGE (i)
Partitions: tpart_1 FOR VALUES FROM (0) TO (100),
tpart_2 FOR VALUES FROM (100) TO (200)

begin isolation level repeatable read;
BEGIN
gpadmin=*# select * from tpart;
i | j
-----+-----
10 | 10
50 | 50
110 | 110
120 | 120
150 | 150
(5 rows)
-- Here, run `ALTER TABLE tpart DROP PARTITION tpart_2 CONCURRENTLY`
-- but only complete the first transaction.

-- the tuples from tpart_2 are gone.
gpadmin=*# select * from tpart;
i | j
----+----
10 | 10
50 | 50
(2 rows)

gpadmin=*# \d+ tpart_2
Table "public.tpart_2"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
i | integer | | | | plain | |
j | integer | | | | plain | |
Partition of: tpart FOR VALUES FROM (100) TO (200)
Partition constraint: ((i IS NOT NULL) AND (i >= 100) AND (i < 200))
Access method: heap

-- the part tpart_2 is not showed as DETACHED
gpadmin=*# \d+ tpart
Partitioned table "public.tpart"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
i | integer | | | | plain | |
j | integer | | | | plain | |
Partition key: RANGE (i)
Partitions: tpart_1 FOR VALUES FROM (0) TO (100),
tpart_2 FOR VALUES FROM (100) TO (200)

-- next, the insert failed. It's OK.
gpadmin=*# insert into tpart values(130,130);
ERROR: no partition of relation "tpart" found for row
DETAIL: Partition key of the failing row contains (i) = (130).

Is this an expected behavior?

Regards,
Hao Wu

________________________________
From: Robert Haas <robertmhaas(at)gmail(dot)com>
Sent: Thursday, August 27, 2020 11:46 PM
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

On Wed, Aug 26, 2020 at 7:40 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> To mark it detached means to set pg_inherits.inhdetached=true. That
> column name is a bit of a misnomer, since that column really means "is
> in the process of being detached"; the pg_inherits row goes away
> entirely once the detach process completes. This mark guarantees that
> everyone will see that row because the detaching session waits long
> enough after committing the first transaction and before deleting the
> pg_inherits row, until everyone else has moved on.

OK. Do you just wait until the XID of the transaction that set
inhdetached is all-visible, or how do you do it?

> The other point is that the partition directory code can be asked to
> include detached partitions, or not to. The executor does the former,
> and the planner does the latter. This allows a transient period during
> which the partition descriptor returned to planner and executor is
> different; this makes the situation equivalent to what would have
> happened if the partition was attached during the operation: in executor
> we would detect that there is an additional partition that was not seen
> by the planner, and we already know how to deal with that situation by
> your handling of the ATTACH code.

Ah ha! That is quite clever and I don't think that I would have
thought of it. So all the plans that were created before you set
inhdetached=true have to be guaranteed to be invaliated or gone
altogether before you can actually delete the pg_inherits row. It
seems like it ought to be possible to ensure that, though I am not
surely of the details exactly. Is it sufficient to wait for all
transactions that have locked the table to go away? I'm not sure
exactly how this stuff interacts with the plan cache.

> There is one fly in the ointment though, which is that if you cancel the
> wait and then immediately do the ALTER TABLE DETACH FINALIZE without
> waiting for as long as the original execution would have waited, you
> might end up killing the partition ahead of time. One solution to this
> would be to cause the FINALIZE action to wait again at start. This
> would cause it to take even longer, but it would be safer. (If you
> don't want it to take longer, you can just not cancel it in the first
> place.) This is not a problem if the server crashes in between (which
> is the scenario I had in mind when doing the FINALIZE thing), because of
> course no transaction can continue to run across a crash.

Yeah, it sounds like this will require some solution, but I agree that
just waiting "longer" seems acceptable.

--
Robert Haas
EnterpriseDB: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.enterprisedb.com&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=tqYUKh-fXcYPWSaF4E-D6A&m=SEDl-6dEISo7BA0qWuv1-idQUVtO0M6qz7hcfwlrF3I&s=pZ7Dx6xrJOYkKKMlXR4wpJNZv-W10wQkMfXdEjtIXJY&e=
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-08-31 07:28:20 Re: doc review for v13
Previous Message Michael Paquier 2020-08-31 06:52:44 Re: please update ps display for recovery checkpoint