Re: REINDEX backend filtering

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX backend filtering
Date: 2021-02-26 09:50:25
Message-ID: CABUevEzrXx7p7_9gJiDeL6CJBrATKZGF7MyzbdpVbe44ytK+-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 21, 2021 at 4:13 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> > > Is this really a common enough operation that we need it in the main grammar?
> > >
> > > Having the functionality, definitely, but what if it was "just" a
> > > function instead? So you'd do something like:
> > > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> > > \gexec
> > >
> > > Or even a function that returns the REINDEX commands directly (taking
> > > a parameter to turn on/off concurrency for example).
> > >
> > > That also seems like it would be easier to make flexible, and just as
> > > easy to plug into reindexdb?
> >
> > Having control in the grammar to choose which index to reindex for a
> > table is very useful when it comes to parallel reindexing, because
> > it is no-brainer in terms of knowing which index to distribute to one
> > job or another. In short, with this grammar you can just issue a set
> > of REINDEX TABLE commands that we know will not conflict with each
> > other. You cannot get that level of control with REINDEX INDEX as it
> > may be possible that more or more commands conflict if they work on an
> > index of the same relation because it is required to take lock also on
> > the parent table. Of course, we could decide to implement a
> > redistribution logic in all frontend tools that need such things, like
> > reindexdb, but that's not something I think we should let the client
> > decide of. A backend-side filtering is IMO much simpler, less code,
> > and more elegant.
>
> Maybe additional filtering capabilities is not something that will be
> required frequently, but I'm pretty sure that reindexing only indexes
> that might be corrupt is something that will be required often.. So I
> agree, having all that logic in the backend makes everything easier
> for users, having the choice of the tools they want to issue the query
> while still having all features available.

I agree with that scenario -- in that the most common case will be
exactly that of reindexing only indexes that might be corrupt.

I don't agree with the conclusion though.

The most common case of that will be in the case of an upgrade. In
that case I want to reindex all of those indexes as quickly as
possible. So I'll want to parallelize it across multiple sessions
(like reindexdb -j 4 or whatever). But doesn't putting the filter in
the grammar prevent me from doing exactly that? Each of those 4 (or
whatever) sessions would have to guess which would go where and then
speculatively run the command on that, instead of being able to
directly distribute the worload?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-02-26 10:00:13 Re: Add --tablespace option to reindexdb
Previous Message Magnus Hagander 2021-02-26 09:47:38 Re: REINDEX backend filtering