Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2018-11-09 14:50:19
Message-ID: CA+TgmoZG7x1repg-bBT5qTd9UBWhNbRXiRuvDXrRgd6N=Cowrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 8, 2018 at 3:59 PM David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 9 November 2018 at 05:34, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I suspect the only good way of fixing this problem is using a single
> > snapshot to perform both the scan of pg_inherits and the subsequent
> > pg_class lookups. That way, you know that you are seeing the state of
> > the whole partitioning hierarchy as it existed at some particular
> > point in time -- every commit is either fully reflected in the
> > constructed PartitionDesc or not reflected at all. Unfortunately,
> > that would mean that we can't use the syscache to perform the lookups,
> > which might have unhappy performance consequences.
>
> I do have a patch sitting around that moves the relpartbound into a
> new catalogue table named pg_partition. This gets rid of the usage of
> pg_inherits for partitioned tables. I wonder if that problem is easier
> to solve with that. It also solves the issue with long partition keys
> and lack of toast table on pg_class.

Yeah, I thought about that, and it does make some sense. Not sure if
it would hurt performance to have to access another table, but maybe
it comes out in the wash if pg_inherits is gone? Seems like a fair
amount of code rearrangement just to get around the lack of a TOAST
table on pg_class, but maybe it's worth it.

I had another idea, too. I think we might be able to reuse the
technique Noah invented in 4240e429d0c2d889d0cda23c618f94e12c13ade7.
That is:

- make a note of SharedInvalidMessageCounter before doing any of the
relevant catalog lookups
- do them
- AcceptInvalidationMessages()
- if SharedInvalidMessageCounter has changed, discard all the data we
collected and retry from the top

I believe that is sufficient to guarantee that whatever we construct
will have a consistent view of the catalogs which is the most recent
available view as of the time we do the work. And with this approach
I believe we can continue to use syscache lookups to get the data
rather than having to use actual index scans, which is nice.

Then again, with your approach I'm guessing that one index scan would
get us the list of children and their partition bound information.
That would be even better -- the syscache lookup per child goes away
altogether; it's just a question of deforming the pg_partition tuples.

Way back at the beginning of the partitioning work, I mulled over the
idea of storing the partition bound information in a new column in
pg_inherits rather than in pg_class. I wonder why exactly I rejected
that idea, and whether I was wrong to do so. One possible advantage
of that approach over a pg_partition table is that is that client code
which queries pg_inherits will have to be adjusted if we stop using
it, and some of those queries are going to get more complicated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2018-11-09 15:00:41 Uninterruptible long planning of a query with too many WHERE clauses
Previous Message Tomas Vondra 2018-11-09 13:16:31 Re: shared-memory based stats collector