Re: Add parallelism and glibc dependent only options to reindexdb

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-11 04:04:33
Message-ID: 20190711040433.GG4500@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 10, 2019 at 09:44:14PM +0200, Julien Rouhaud wrote:
> On Wed, Jul 10, 2019 at 4:15 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>> Looking good!
>
> Thanks!

Confirmed. The last set is much easier to go through.

>> I'm not sure about the "Consume" word in ConsumeIdleSlot;
>> maybe "Reserve"? "Obtain"? "Get"?
>
> Yes, Consume is maybe a little bit weird. I wanted to point out the
> make it clear that this function is actually removing a slot from the
> free list, especially since there's a (historical) get_idle_slot(). I
> like Reserve, but Obtain and Get are probably too ambiguous.

The refactoring patch is getting in shape. Now reindex_one_database()
is the only place setting and manipulating the slots. I am wondering
if we should have a wrapper which disconnects all the slots (doing
conn = NULL after the disconnectDatabase() call does not matter).
Get* would be my choice, because we look at the set of parallel slots,
and get an idle one. It would be nice to have more consistency in the
names for the routines, say:
- ParallelSlotInit() instead of SetupParallelSlots (still my
suggestion is not perfect either as that sounds like one single slot,
but we have a set of these).
- ParallelSlotGetIdle() instead of ConsumeIdleSlot(). Still that's
more a wait-then-get behavior.
- ParallelSlotWaitCompletion() instead of WaitForSlotsCompletion()
- ParallelSlotDisconnect, as a wrapper for the calls to
DisconnectDatabase().

>> Placement: I think it's good if related functions stay together, or
>> there is some other rationale for placement within the file. I have two
>> favorite approaches: one is to put all externally callable functions at
>> top of file, followed by all the static helpers in the lower half of the
>> file. The other is to put each externally accessible immediately
>> followed by its specific static helpers. If you choose one of those,
>> that means that SetupParallelSlots should either move upwards, or move
>> downwards. The current ordering seems a dartboard kind of thing where
>> the thrower is not Green Arrow.
>
> I usually prefer to group exported functions together and static ones
> together, as I find it more maintainable in the long term, so upwards
> it'll be.

That's mainly a matter of taste. Depending on the code path in the
tree, sometimes the two approaches from above are used. We have some
other files where the static routines are listed first at the top,
followed by the exported ones at the bottom as it saves from some
declarations of the functions at the top of the file. Keeping the
footprint of the author is not that bad either, and that depends also
on the context. For this one, as the static functions are linked to
the exported ones in a close manner, I would keep each set grouped
though.

+ /*
+ * Database-wide parallel reindex requires special processing. If
+ * multiple jobs were asked, we have to reindex system catalogs first,
+ * as they can't be processed in parallel.
+ */
+ if (process_type == REINDEX_DATABASE)

In patch 0002, a parallel database REINDEX first processes the catalog
relations in a serializable fashion, and then all the other relations
in parallel, which is right Could we have schema-level reindexes also
process things in parallel with all the relations from all the schemas
listed? This would be profitable in particular for callers listing
multiple schemas with an unbalanced number of tables in each, and we'd
need to be careful of the same where pg_catalog is listed. Actually,
your patch breaks if we do a parallel run with pg_catalog and another
schema, no?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2019-07-11 04:28:04 Re: Implementing Incremental View Maintenance
Previous Message Dilip Kumar 2019-07-11 03:47:41 Re: POC: Cleaning up orphaned files using undo logs