Re: REINDEX backend filtering

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX backend filtering
Date: 2021-03-02 04:01:52
Message-ID: 20210302040152.qmtcrv7sgu2lz54v@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 26, 2021 at 11:17:26AM +0100, Magnus Hagander wrote:
> On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > It means that you'll have to distribute the work on a per-table basis
> > rather than a per-index basis. The time spent to find out that a
> > table doesn't have any impacted index should be negligible compared to
> > the cost of running a reindex. This obviously won't help that much if
> > you have a lot of table but only one being gigantic.
>
> Yeah -- or at least a couple of large and many small, which I find to
> be a very common scenario. Or the case of some tables having many
> affected indexes and some having few.
>
> You'd basically want to order the operation by table on something like
> "total size of the affected indexes on table x" -- which may very well
> put a smaller table with many indexes earlier in the queue. But you
> can't do that without having access to the filter....

So, long running reindex due to some gigantic and/or numerous indexes on a
single (or few) table is not something that we can solve, but inefficient
reindex due to wrong table size / to-be-reindexed-indexes-size correlation can
be addressed.

I would still prefer to go to backend implementation, so that all client tools
can benefit from it by default. We could simply export the current
index_has_oudated_collation(oid) function in sql, and tweak pg_dump to order
tables by the cumulated size of such indexes as you mentioned below, would
that work for you?

Also, given Thomas proposal in a nearby email this function would be renamed to
index_has_oudated_dependencies(oid) or something like that.

> > But even if we put the logic in the client, this still won't help as
> > reindexdb doesn't support multiple job with an index list:
> >
> > * Index-level REINDEX is not supported with multiple jobs as we
> > * cannot control the concurrent processing of multiple indexes
> > * depending on the same relation.
> > */
> > if (concurrentCons > 1 && indexes.head != NULL)
> > {
> > pg_log_error("cannot use multiple jobs to reindex indexes");
> > exit(1);
> > }
>
> That sounds like it would be a fixable problem though, in principle.
> It could/should probably still limit all indexes on the same table to
> be processed in the same connection for the locking reasons of course,
> but doing an order by the total size of the indexes like above, and
> ensuring that they are grouped that way, doesn't sound *that* hard. I
> doubt it's that important in the current usecase of manually listing
> the indexes, but it would be useful for something like this.

Yeah, I don't think that in case of oudated dependency the --index will be
useful, it's likely that there will be too many indexes to process. We can
still try to improve reindexdb to be able to process index lists with parallel
connections, but I would rather keep that separated from this patch.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-02 04:03:37 Re: repeated decoding of prepared transactions
Previous Message Julien Rouhaud 2021-03-02 03:43:31 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?