Re: Add parallelism and glibc dependent only options to reindexdb

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Date: 2019-07-17 07:59:02
Message-ID: 20190717075902.GA20614@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 16, 2019 at 02:03:16PM +0200, Julien Rouhaud wrote:
> After more thinking about schema and multiple jobs, I think that
> erroring out is quite user unfriendly, as it's entirely ok to ask
> for
> multiple indexes and multiple object that do support parallelism in
> a
> single call. So I think it's better to remove the error, ignore the
> given --jobs options for indexes and document this behavior.

No objections to that. I still need to study a bit more 0002 though
to come to a clear conclusion.

Actually, from patch 0002:
+ free_slot = ParallelSlotsGetIdle(slots, concurrentCons, progname);
+ if (!free_slot)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ run_reindex_command(conn, process_type, objname, progname, echo,
+ verbose, concurrently, true);
The same connection gets reused, shouldn't the connection come from
the free slot?

On top of that quick lookup, I have done an in-depth review on 0001 to
bring it to a committable state, fixing a couple of typos, incorrect
comments (description of ParallelSlotsGetIdle was for example
incorrect) on the way. Other things include that connectDatabase
should have an assertion for a non-NULL connection, calling pg_free()
on the slots terminate is more consistent as pg_malloc is used first.
A comment at the top of processQueryResult still referred to
vacuuming of a missing relation. Most of the patch was in a clean
state, with a clear interface for parallel slots, the place of the new
routines also makes sense, so I did not have much to do :)

Another thing I have noticed is that we don't really need to pass down
progname across all those layers as we finish by using pg_log_error()
when processing results, so more simplifications can be done. Let's
handle all that in the same patch as we are messing with the area.
connectDatabase() and connectMaintenanceDatabase() still need it
though as this is used in the connection string, so
ParallelSlotsSetup() also needs it. This part is not really your
fault but as I am looking at it, it does not hurt to clean up what we
can. Attached is an updated version of 0001 that I am comfortable
with. I'd like to commit that with the cleanups included and then
let's move to the real deal with 0002.
--
Michael

Attachment Content-Type Size
scripts-refactor-parallel.patch text/x-diff 29.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-17 08:04:24 Re: pg_receivewal documentation
Previous Message Fabien COELHO 2019-07-17 07:55:39 Re: refactoring - share str2*int64 functions