Re: REINDEX backend filtering

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX backend filtering
Date: 2021-02-26 10:17:26
Message-ID: CABUevEwVqyBcSgewSy+v+kWukR-szWssKv4C-cGswBttkLuxMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> >
> > I don't agree with the conclusion though.
> >
> > The most common case of that will be in the case of an upgrade. In
> > that case I want to reindex all of those indexes as quickly as
> > possible. So I'll want to parallelize it across multiple sessions
> > (like reindexdb -j 4 or whatever). But doesn't putting the filter in
> > the grammar prevent me from doing exactly that? Each of those 4 (or
> > whatever) sessions would have to guess which would go where and then
> > speculatively run the command on that, instead of being able to
> > directly distribute the worload?
>
> It means that you'll have to distribute the work on a per-table basis
> rather than a per-index basis. The time spent to find out that a
> table doesn't have any impacted index should be negligible compared to
> the cost of running a reindex. This obviously won't help that much if
> you have a lot of table but only one being gigantic.

Yeah -- or at least a couple of large and many small, which I find to
be a very common scenario. Or the case of some tables having many
affected indexes and some having few.

You'd basically want to order the operation by table on something like
"total size of the affected indexes on table x" -- which may very well
put a smaller table with many indexes earlier in the queue. But you
can't do that without having access to the filter....

> But even if we put the logic in the client, this still won't help as
> reindexdb doesn't support multiple job with an index list:
>
> * Index-level REINDEX is not supported with multiple jobs as we
> * cannot control the concurrent processing of multiple indexes
> * depending on the same relation.
> */
> if (concurrentCons > 1 && indexes.head != NULL)
> {
> pg_log_error("cannot use multiple jobs to reindex indexes");
> exit(1);
> }

That sounds like it would be a fixable problem though, in principle.
It could/should probably still limit all indexes on the same table to
be processed in the same connection for the locking reasons of course,
but doing an order by the total size of the indexes like above, and
ensuring that they are grouped that way, doesn't sound *that* hard. I
doubt it's that important in the current usecase of manually listing
the indexes, but it would be useful for something like this.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2021-02-26 10:42:47 Re: repeated decoding of prepared transactions
Previous Message Julien Rouhaud 2021-02-26 10:07:18 Re: REINDEX backend filtering