Re: reindex partitioned indexes: refactor ReindexRelationConcurrently ?

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: reindex partitioned indexes: refactor ReindexRelationConcurrently ?
Date: 2020-11-02 08:18:23
Message-ID: 20201102081823.GD15770@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 02, 2020 at 01:00:06AM -0600, Justin Pryzby wrote:
> The reason is that concurrent Reindex must wait for longrunning transactions,
> and if we call it in a loop, then we wait for longrunning transactions N times.
> I can imagine scenarios where it's easy for an DBA to schedule maintenance to
> do reindex concurrently and restart processes to allow the reindex to proceed.
> But it might be infeasible to restart processes every 5min for 3 hours to allow
> reindex to proceed on each partition.
>
> ReindexMultipleTables avoids doing that to avoid deadlocks, which makes great
> sense for REINDEX SCHEMA/DATABASE/SYSTEM. But I wonder if that reasoning
> doesn't apply to partitioned tables.
>
> I think the usual scenario is to have 100-1000 partitions, and 1-10 indexes per
> partition. It seems to me that at least all partitions of a given index should
> be processed simultaneously.

ReindexPartitions(), as currently shaped, has the advantage to
minimize the number of ccnew and ccold indexes to handle in parallel.
With your suggestion, there could be potentially hundreds of
built-still-invalid indexes or invalid-but-not-dropped indexes
depending on the phase where the whole REINDEX operation fails, if it
fails of course. So I would say no to your proposal and I would
prefer keeping the approach where we minimize the remnants of a failed
operation to a bare minimum (aka one index for REINDEX INDEX, and one
set of indexes on a single relation for REINDEX TABLE).
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-11-02 08:25:04 Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)
Previous Message Heikki Linnakangas 2020-11-02 08:10:56 Re: Fix a typo in verify_heapam.c