Re: Add parallelism and glibc dependent only options to reindexdb

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-01 16:28:13
Message-ID: CAOBaU_ZuLr8YY=Uso+q3k7pOtTYfSLMM+JVsCxGQCnOp=aK=aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 1, 2019 at 4:10 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
> > - 0003 begins to be the actual fancy thing with the addition of a
> > --jobs option into reindexdb. The main issue here which should be
> > discussed is that when it comes to reindex of tables, you basically
> > are not going to have any conflicts between the objects manipulated.
> > However if you wish to do a reindex on a set of indexes then things
> > get more tricky as it is necessary to list items per-table so as
> > multiple connections do not conflict with each other if attempting to
> > work on multiple indexes of the same table. What this patch does is
> > to select the set of indexes which need to be worked on (see the
> > addition of cell in ParallelSlot), and then does a kind of
> > pre-planning of each item into the connection slots so as each
> > connection knows from the beginning which items it needs to process.
> > This is quite different from vacuumdb where a new item is distributed
> > only on a free connection from a unique list. I'd personally prefer
> > if we keep the facility in parallel.c so as it is only
> > execution-dependent and that we have no pre-planning. This would
> > require keeping within reindexdb.c an array of lists, with one list
> > corresponding to one connection instead which feels more natural.
>
> Couldn't we make this enormously simpler and less bug-prone by just
> dictating that --jobs applies only to reindex-table operations?

That would also mean that we'll have to fallback on doing reindex at
table-level, even if we only want to reindex indexes that depends on
glibc. I'm afraid that this will often add a huge penalty.

> > - 0004 is the part where the concurrent additions really matter as
> > this consists in applying an extra filter to the indexes selected so
> > as only the glibc-sensitive indexes are chosen for the processing.
>
> I think you'd be better off to define and document this as "reindex
> only collation-sensitive indexes", without any particular reference
> to a reason why somebody might want to do that.

We should still document that indexes based on ICU would be exluded?
I also realize that I totally forgot to update reindexdb.sgml. Sorry
about that, I'll fix with the next versions.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2019-07-01 16:58:07 Re: POC: converting Lists into arrays
Previous Message Julien Rouhaud 2019-07-01 16:14:20 Re: Add parallelism and glibc dependent only options to reindexdb