Re: REINDEX filtering in the backend

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX filtering in the backend
Date: 2019-08-28 05:02:08
Message-ID: 20190828050208.GG1965@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 11, 2019 at 11:14:20PM +0200, Julien Rouhaud wrote:
> I didn't want to spend too much time enjoying bison and adding new
> unreserved keywords, so for now I just implemented this syntax to
> start a discussion for this feature in the next commitfest:
>
> REINDEX ( FILTER = COLLATION ) [...]
>
> The FILTER clause can be used multiple times, each one is OR-ed with
> the ReindexStmt's option, so we could easily add a LIBC, ICU and other
> filters, also making COLLATION (or more realistically a better new
> keyword) an alias for (LIBC | ICU) for instance.

I would prefer keeping the interface simple with only COLLATION, so as
only collation sensitive indexes should be updated, including icu and
libc ones. Or would it be useful to have the filtering for both as
libicu can break things similarly to glibc in an update still a
breakage on one or the other would not happen at the same time? I
don't know enough of libicu regarding that, eager to learn. In which
case, wouldn't it be better to support that from the start?

> The filtering is done at table level (with and without the
> concurrently option), so SCHEMA, DATABASE and SYSTEM automatically
> benefit from it. If this clause is used with a REINDEX INDEX, the
> statement errors out, as I don't see a valid use case for providing a
> single index name and asking to possibly filter it at the same time.

Supporting that case would not be that much amount of work, no?

> I also added some minimal documentation and regression tests. I'll
> add this patch to the next commitfest.
>
> [1] https://www.postgresql.org/message-id/7140716c-679e-a0b9-a273-b201329d8891%402ndquadrant.com

+ if ((stmt->options & REINDEXOPT_ALL_FILTERS) != 0)
+ elog(ERROR, "FILTER clause is not compatible with REINDEX INDEX");
[...]
+ discard indexes whose ordering does not depend on a collation. Note that
+ the FILTER option is not compatible with <command>REINDEX
+ SCHEMA</command>.

Why do you have both limitations? I think that it would be nice to be
able to do both, generating an error for REINDEX INDEX if the index
specified is not compatible, and a LOG if the index is not filtered
out when a list is processed. Please note that elog() cannot be used
for user-facing failures, only for internal ones.

+REINDEX (VERBOSE, FILTER = COLLATION) TABLE reindex_verbose;
+-- One column, not depending on a collation
In order to make sure that a reindex has been done for a given entry
with the filtering, an idea is to save the relfilenode before the
REINDEX and check it afterwards. That would be nice to make sure that
only the wanted indexes are processed, but it is not possible to be
sure of that based on your tests, and some tests should be done on
relations which have collation-sensitive as well as
collation-insensitive indexes.

+ index = index_open(indexOid, AccessShareLock);
+ numAtts = index->rd_index->indnatts;
+ index_close(index, AccessShareLock);
Wouldn't it be better to close that after doing the scan?

Nit: I am pretty sure that this should be indented.

Could you add tests with REINDEX CONCURRENTLY?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-28 05:09:51 Re: REINDEX filtering in the backend
Previous Message Michael Paquier 2019-08-28 03:30:23 Re: Re: Email to hackers for test coverage