Re: Proposal: Local indexes for partitioned table

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Maksim Milyutin <m(dot)milyutin(at)postgrespro(dot)ru>
Subject: Re: Proposal: Local indexes for partitioned table
Date: 2017-10-10 15:22:28
Message-ID: 20171010152228.x327jg7g62khyboj@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > 2. create one index for each existing partition. These would be
> > identical to what would happen if you created the index directly on
> > each partition, except that there is an additional dependency to the
> > parent's abstract index.
> One thing I'm a bit worried about is how to name these subordinate
> indexes. They have to have names because that's how pg_class works,
> and those names can't all be the same, again because that's how
> pg_class works. There's no problem right away when you first create
> the partitioned index, because you can just pick names out of a hat
> using whatever name-generating algorithm seems best. However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names. If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.
> These are pretty annoying corner-case bugs because they're not likely
> to come up very often. Most people won't notice or care if the index
> name changes. But I don't think it's acceptable to just ignore the
> problem.

I agree it's a problem that needs to be addressed directly.

> An idea I had was to treat the abstract index - to use your
> term - sort of the way we treat an extension. Normally, when you
> create an index on a partitioned table, it cascades, but for dump and
> restore purpose, we tag on some syntax that says "well, don't actually
> create the subordinate indexes, i'll tell you about those later".
> Then for each subordinate index we issue a separate CREATE INDEX
> command followed by ALTER INDEX abstract_index ATTACH PARTITION
> concrete_index or something of that sort. That means you can't
> absolutely count on the parent index to have all of the children it's
> supposed to have but maybe that's OK.

Hmm ... yeah, ATTACH and DETACH sound acceptable to me. On DETACH, the
abstract index should be marked indisvalid=false unless a substitute
index already exists; and on ATTACH when indisvalid=false we verify that
all local indexes exist, and if so we can flip indisvalid. That way, we
can continue to rely on the parent index always having all its children
when the flag is set.

I'm not clear on a syntax that creates the main index and hopes to later
have the sub-indexes created. Another approach is to do it the other
way around, i.e. create the children first, then once they're all in
place create the main one normally, which merely verifies that all the
requisite children exist. This is related to what I proposed to occur
when a regular table is joined as a partition of the partitioned table:
we run a verification that an index matching the parent's abstract
indexes exists, and if not we raise an error. (Alternatively we could
allow the case, and mark the abstract index as indisvalid=false, but
that seems to violate POLA).

> Another thing that would let you do is CREATE INDEX CONCURRENTLY
> replacement_concrete_index; ALTER INDEX abstract_index DETACH
> PARTITION old_concrete_index, ATTACH PARTITION
> replacement_concrete_index; DROP INDEX CONCURRENTLY
> old_concrete_index, which seems like a thing someone might want to do.

Yeah, this is a point I explicitly mentioned, and this proposal seems
like a good way.

Álvaro Herrera
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2017-10-10 15:22:51 Re: 10.0: Logical replication doesn't execute BEFORE UPDATE OF <columns> trigger
Previous Message Alvaro Herrera 2017-10-10 14:14:44 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple