Re: REINDEX backend filtering

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX backend filtering
Date: 2021-03-14 14:57:37
Message-ID: 20210314145737.enqhi2jxszws3ywi@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 14, 2021 at 08:54:11PM +0900, Michael Paquier wrote:
> On Sun, Mar 14, 2021 at 04:10:07PM +0800, Julien Rouhaud wrote:
>
> + bool outdated_filter = false;
> Wouldn't it be better to rename that "outdated" instead for
> consistency with the other options?

I agree.

> In ReindexRelationConcurrently(), there is no filtering done for the
> index themselves. The operation is only done on the list of indexes
> fetched from the parent relation. Why? This means that a REINDEX
> (OUTDATED) INDEX would actually rebuild an index even if this index
> has no out-of-date collations, like a catalog. I think that's
> confusing.
>
> Same comment for the non-concurrent case, as of the code paths of
> reindex_index().

Yes, I'm not sure what we should do in that case. I thought I put a comment
about that but it apparently disappeared during some rewrite.

Is there really a use case for reindexing a specific index and at the same time
asking for possibly ignoring it? I think we should just forbid REINDEX
(OUTDATED) INDEX. What do you think?

> Would it be better to inform the user which indexes are getting
> skipped in the verbose output if REINDEXOPT_VERBOSE is set?

I was thinking that users would be more interested in the list of indexes being
processed rather than the full list of indexes and a mention of whether it was
processed or not. I can change that if you prefer.

> + <para>
> + Check if the specified index has any outdated dependency. For now only
> + dependency on collations are supported.
> + </para></entry>
> [...]
> + <term><literal>OUTDATED</literal></term>
> + <listitem>
> + <para>
> + This option can be used to filter the list of indexes to rebuild and only
> + process indexes that have outdated dependencies. Fow now, the only
> + handle dependency is for the collation provider version.
> + </para>
> Do we really need to be this specific in this part of the
> documentation with collations?

I think it's important to document what this option really does, and I don't
see a better place to document it.

> The last sentence of this paragraph
> sounds weird. Don't you mean instead to write "the only dependency
> handled currently is the collation provider version"?

Indeed, I'll fix!

> +\set VERBOSITY terse \\ -- suppress machine-dependent details
> +-- no suitable index should be found
> +REINDEX (OUTDATED) TABLE reindex_coll;
> What are those details?

That just the same comment as the previous occurence in the file, I kept it for
consistency.

> And wouldn't it be more stable to just check
> after the relfilenode of the indexes instead?

Agreed, I'll add additional tests for that.

> " ORDER BY sum(ci.relpages)"
> Schema qualification here, twice.

Well, this isn't actually mandatory, per comment at the top:

/*
* The queries here are using a safe search_path, so there's no need to
* fully qualify everything.
*/

But I think it's a better style to fully qualify objects, so I'll fix.

> + rel = try_relation_open(indexOid, AccessShareLock);
> +
> + if (rel == NULL)
> + PG_RETURN_NULL();
> Let's introduce a try_index_open() here.

Good idea!

> What's the point in having both index_has_outdated_collation() and
> index_has_outdated_collation()?

Did you mean index_has_outdated_collation() and
index_has_outadted_dependency()? It's just to keep things separated, mostly
for future improvements on that infrastructure. I can get rid of that function
and put back the code in index_has_outadted_dependency() if that's overkill.

> It seems to me that 0001 should be split into two patches:
> - One for the backend OUTDATED option.
> - One for pg_index_has_outdated_dependency(), which only makes sense
> in-core once reindexdb is introduced.

I thought it would be better to add the backend part in a single commit, and
then built the client part on top of that in a different commit. I can
rearrange things if you want, but in that case should
index_has_outadted_dependency() be in a different patch as you mention or
simply merged with 0002 (the --oudated option for reindexdb)?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2021-03-14 15:00:00 Re: More time spending with "delete pending"
Previous Message Alvaro Herrera 2021-03-14 14:54:46 Re: pgbench - add pseudo-random permutation function