Re: Add parallelism and glibc dependent only options to reindexdb

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-09 07:24:46
Message-ID: 20190709072446.GC17321@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote:
> On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>> I'll resubmit the parallel patch using per-table only
>> approach
>
> Attached.

I have done a lookup of this patch set with a focus on the refactoring
part, and the split is a bit confusing.

+void
+DisconnectDatabase(ParallelSlot *slot)
+{
+ char errbuf[256];
common.c has already an API to connect to a database. It would be
more natural to move the disconnection part also to common.c and have
the caller of DisconnectDatabase reset the slot connection by itself?
disconnectDatabase() (lower case for the first character) would make
the set more consistent. We could also have a wrapper like say
DiscardSlot() which does this work, but that seems like an overkill
for a single slot if one API could do the cleanup of the full set.

$ git grep select_loop
scripts_parallel.c: /* We must reconstruct the fd_set for each
call to select_loop */
scripts_parallel.c: i = select_loop(maxFd, &slotset, &aborting);
scripts_parallel.c:select_loop(int maxFd, fd_set *workerset, bool
*aborting)
scripts_parallel.h:extern int select_loop(int maxFd, fd_set
*workerset, bool *aborting);

select_loop is only present in scripts_parallel.c, so it can remain
static.

+ slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) *
concurrentCons);
+ init_slot(slots, conn);
+ if (parallel)
+ {
+ for (i = 1; i < concurrentCons; i++)
+ {
+ conn = connectDatabase(dbname, host, port,
username, prompt_password,
+
progname, echo, false, true);
+ init_slot(slots + i, conn);
+ }
+ }

This comes from 0002 and could be more refactored as vacuumdb does the
same thing. Based on 0001, init_slot() is called now in vacuumdb.c
and initializes a set of slots while connecting to a given database.
In short, in input we have a set of parameters and the ask to open
connections with N slots, and the return result is an pg_malloc'd
array of slots ready to be used. We could just call that
ParallelSlotInit() (if you have a better name feel free).

+ /*
+ * Get the connection slot to use. If in parallel mode, here we wait
+ * for one connection to become available if none already is. In
+ * non-parallel mode we simply use the only slot we have, which we
+ * know to be free.
+ */
+ if (parallel)
This business also is duplicated in both reindexdb.c and vacuumdb.c.

+bool
+GetQueryResult(PGconn *conn, const char *progname)
+{
This also does not stick with the parallel stuff, as that's basically
only getting a query result. We could stick that into common.c.

Patch 2 has no documentation. The option as well as the restrictions
in place need to be documented properly.

Here is a small idea about the set of routines we could have for the
parallel stuff, with only three of them needed to work on the parallel
slots and get free connections:
- Initialization of the full slot set.
- Cleanup and disconnection of the slots.
- Fetch an idle connection and wait for one until available.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-07-09 07:28:52 Re: Postgres 11: Table Partitioning and Primary Keys
Previous Message David G. Johnston 2019-07-09 06:59:32 Re: Postgres 11: Table Partitioning and Primary Keys