Re: 回复:how to create index concurrently on partitioned table

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: 回复:how to create index concurrently on partitioned table
Date: 2020-08-12 04:54:38
Message-ID: 20200812045438.GA11663@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 09, 2020 at 06:44:23PM -0500, Justin Pryzby wrote:
> On Sun, Aug 09, 2020 at 02:00:09PM +0900, Michael Paquier wrote:
>> For now, I would recommend to focus first on 0001 to add support for
>> partitioned tables and indexes to REINDEX. CIC is much more
>> complicated btw, but I am not entering in the details now.
>>
>> + /* Avoid erroring out */
>> if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> {
>> This comment does not help, and actually this becomes incorrect as
>> reindex for this relkind becomes supported once 0001 is done.
>
> I made a minimal change to avoid forgetting to eventually change
> that part.

Why not changing it then? We already filter out per relkind in all
the code paths calling reindex_relation(), be it in indexcmds.c for
schema-level reindex or even tablecmds.c, so I have switched this part
to an elog().

>> - We should *not* handle directly partitioned index and/or table in
>> ReindexRelationConcurrently() to not complicate the logic where we
>> gather all the indexes of a table/matview. So I think that the list
>> of partition indexes/tables to work on should be built directly in
>> ReindexIndex() and ReindexTable(), and then this should call the
>> second part of ReindexMultipleTables() refactored in the previous
>> point.
>
> I think I addressed these mostly as you intended.

Mostly. I have been hacking on this patch, and basically rewrote it
as the attached. The handling of the memory context used to keep the
list of partitions intact across transactions was rather clunky: the
context was not reset when we are done, and we would call more APIs
than necessary while switching to it, like find_all_inheritors() which
could do much more allocations. I have fixed that by minimizing the
areas where the private context is used, switching to it only when
saving a new OID in the list of partitions, or a session lock (see
below for this part).

While on it, I found that the test coverage was not enough, so I have
extended the set of tests to make sure any concurrent and
non-concurrent operation for partitioned tables and indexes change the
correct set of relfilenodes for each operation. I have written some
custom functions to minimize the duplication (the whole thing cannot
be grouped as those commands cannot run in a transaction block).

Speaking of which, the patch missed that REINDEX INDEX/TABLE should
not run in a transaction block when working on a partitioned
relation. And the documentation needs to be clear about the
limitation of each operation, so I have written more about all that.
The patch also has commented out areas with slashes or such, and I
have added some elog() and some asserts to make sure that we don't
cross any areas that should not work with partitioned relations.

While hacking on this patch, I have found an old bug in the REINDEX
logic: we build a list of relations to reindex in
ReindexMultipleTables() for schema and database reindexes, but it
happens that we don't recheck if the relations listed actually exists
or not, so dropping a relation during a large reindex can cause
sparse failures because of relations that cannot be found anymore. In
the case of this thread, the problem is different though (the proposed
patch was full of holes regarding that) and we need to use session
locks on the parent *table* partitions (not the indexes) to avoid any
issues within the first transaction building the list of relations to
work on, similarly to REINDEX CONCURRENTLY. So I fixed this problem
this way. For the schema and database cases, I think that we would
need to do something similar to VACUUM, aka have an extra code path to
skip relations not defined. I'll leave that for another thread.

One last thing. I think that the patch is in a rather good shape, but
there is one error message I am not happy with when running some
commands in a transaction block. Say, this sequence:
CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
CREATE INDEX parent_index ON parent_tab (id);
BEGIN;
REINDEX INDEX parent_index; -- error
ERROR: 25001: REINDEX INDEX cannot run inside a transaction block
LOCATION: PreventInTransactionBlock, xact.c:3386

This error can be confusing, because we don't tell directly that the
relation involved here is partitioned, and REINDEX INDEX/TABLE are
fine when doing their stuff on non-partitions. For other code paths,
we have leveraged such errors to use the grammar specific to
partitions, for example "CREATE TABLE .. PARTITION OF" or such as
these don't cause translation issues, but we don't have a specific
syntax of REINDEX for partitioned relations, and I don't think that we
need more grammar just for that. The simplest idea I have here is to
just use an error callback to set an errcontext(), saying roughly:
"while reindexing partitioned table/index %s" while we go through
PreventInTransactionBlock(). I have done nothing about that yet but
adding an errcallback is simple enough. Perhaps somebody has a
different idea here?
--
Michael

Attachment Content-Type Size
reindex-part-v6.patch text/x-diff 27.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asim Praveen 2020-08-12 05:06:43 Re: SyncRepLock acquired exclusively in default configuration
Previous Message Tom Lane 2020-08-12 04:08:10 Re: posgres 12 bug (partitioned table)