Re: REINDEX backend filtering

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX backend filtering
Date: 2020-12-15 11:21:55
Message-ID: CAOBaU_aPfmc9bJRTeCyiuwKRpQP9J8bj2GW6N3vmof_zvoPb_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:
> > Now that we have the infrastructure to track indexes that might be corrupted
> > due to changes in collation libraries, I think it would be a good idea to offer
> > an easy way for users to reindex all indexes that might be corrupted.
>
> Yes. It would be a good thing.
>
> > The filter is also implemented so that you could cumulate multiple filters, so
> > it could be easy to add more filtering, for instance:
> >
> > REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;
> >
> > to only rebuild indexes depending on outdated libc collations, or
> >
> > REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;
> >
> > to only rebuild indexes depending on a specific version of libc.
>
> Deciding on the grammar to use depends on the use cases we would like
> to satisfy. From what I heard on this topic, the goal is to reduce
> the amount of time necessary to reindex a system so as REINDEX only
> works on indexes whose dependent collation versions are not known or
> works on indexes in need of a collation refresh (like a reindexdb
> --all --collation -j $jobs). What would be the benefit in having more
> complexity with library-dependent settings while we could take care
> of the use cases that matter the most with a simple grammar? Perhaps
> "not_current" is not the best match as a keyword, we could just use
> "collation" and handle that as a boolean. As long as we don't need
> new operators in the grammar rules..

I'm not sure what the DBA usual pattern here. If the reindexing
runtime is really critical, I'm assuming that at least some people
will dig into library details to see what are the collations that
actually broke in the last upgrade and will want to reindex only
those, and force the version for the rest of the indexes. And
obviously, they probably won't wait to have multiple collation
versions dependencies before taking care of that. In that case the
filters that would matters would be one to only keep indexes with an
outdated collation version, and an additional one for a specific
collation name. Or we could have the COLLATION keyword without
additional argument mean all outdated collations, and COLLATION
'collation_name' to specify a specific one. This is maybe a bit ugly,
and would probably require a different approach for reindexdb.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2020-12-15 11:29:46 RE: Parallel Inserts in CREATE TABLE AS
Previous Message iwata.aya@fujitsu.com 2020-12-15 11:11:58 RE: libpq debug log