Re: Add parallelism and glibc dependent only options to reindexdb

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-11 09:48:20
Message-ID: CAOBaU_bVFgyt17J4+BFXeAxJfHZQShixruEs_gdJ4A=XRvjxJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 11, 2019 at 6:04 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote:
> > On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> >> Looking good!
> >
> > Thanks!
>
> Confirmed. The last set is much easier to go through.
>
> >> I'm not sure about the "Consume" word in ConsumeIdleSlot;
> >> maybe "Reserve"? "Obtain"? "Get"?
> >
> > Yes, Consume is maybe a little bit weird. I wanted to point out the
> > make it clear that this function is actually removing a slot from the
> > free list, especially since there's a (historical) get_idle_slot(). I
> > like Reserve, but Obtain and Get are probably too ambiguous.
>
> The refactoring patch is getting in shape. Now reindex_one_database()
> is the only place setting and manipulating the slots. I am wondering
> if we should have a wrapper which disconnects all the slots (doing
> conn = NULL after the disconnectDatabase() call does not matter).

You already mentioned that in a previous mail. I was afraid it'd be
overkill, but it'll make caller code easier, so let's do it.

> Get* would be my choice, because we look at the set of parallel slots,
> and get an idle one.

That's what the former GetIdleSlot (that I renamed to get_idle_slot as
it's not static) is doing. ConsumeIdleSlot() actually get an idle
slot and mark it as being used. That's probably some leakage of
internal implementation, but having a GetIdleParallelSlot (or
ParallelSlotGetIdle) *and* a get_idle_slot sounds like a bad idea, and
I don't have a better idea on how to rename get_idle_slot. If you
really prefer Get* and you're fine with a static get_idle_slot, that's
fine by me.

> It would be nice to have more consistency in the
> names for the routines, say:
> - ParallelSlotInit() instead of SetupParallelSlots (still my
> suggestion is not perfect either as that sounds like one single slot,
> but we have a set of these).
> - ParallelSlotGetIdle() instead of ConsumeIdleSlot(). Still that's
> more a wait-then-get behavior.
> - ParallelSlotWaitCompletion() instead of WaitForSlotsCompletion()
> - ParallelSlotDisconnect, as a wrapper for the calls to
> DisconnectDatabase().

I don't have an opinion on whether to use parallel slot as prefix or
postfix, so I'm fine with postfixing.

> + /*
> + * Database-wide parallel reindex requires special processing. If
> + * multiple jobs were asked, we have to reindex system catalogs first,
> + * as they can't be processed in parallel.
> + */
> + if (process_type == REINDEX_DATABASE)
>
> In patch 0002, a parallel database REINDEX first processes the catalog
> relations in a serializable fashion, and then all the other relations
> in parallel, which is right Could we have schema-level reindexes also
> process things in parallel with all the relations from all the schemas
> listed? This would be profitable in particular for callers listing
> multiple schemas with an unbalanced number of tables in each

It would also be beneficial for a parallel reindex of a single schema.

> and we'd
> need to be careful of the same where pg_catalog is listed. Actually,
> your patch breaks if we do a parallel run with pg_catalog and another
> schema, no?

It can definitely cause problems if you ask for pg_catalog and other
schema, same as if you ask a parallel reindex of some catalog tables
(possibly with other tables). There's a --system switch for that
need, so I don't know if documenting the limitation to avoid some
extra code to deal with it is a good enough solution?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrien Nayrat 2019-07-11 09:55:53 Re: Feature: Add Greek language fulltext search
Previous Message Adrien Nayrat 2019-07-11 09:48:11 Re: Detailed questions about pg_xact_commit_timestamp