Re: Choosing parallel_degree

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>, James Sewell <james(dot)sewell(at)lisasoft(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
Subject: Re: Choosing parallel_degree
Date: 2016-04-04 14:02:19
Message-ID: CADkLM=fBK6nrRgiEXqrsSk3hYjeZEuWSLTm5BMwvPwde6gGhdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 4, 2016 at 2:55 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Sun, Apr 3, 2016 at 4:37 PM, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
> wrote:
> >
> > On 22/03/2016 07:58, Julien Rouhaud wrote:
> > > On 21/03/2016 20:38, Julien Rouhaud wrote:
> > >> On 21/03/2016 05:18, James Sewell wrote:
> > >>> OK cool, thanks.
> > >>>
> > >>> Can we remove the minimum size limit when the per table degree
> setting
> > >>> is applied?
> > >>>
> > >>> This would help for tables with 2 - 1000 pages combined with a high
> CPU
> > >>> cost aggregate.
> > >>>
> > >>
> > >> Attached v4 implements that. It also makes sure that the chosen
> > >> parallel_degree won't be more than the relation's estimated number of
> pages.
> > >>
> > >
> > > And I just realize that it'd prevent from forcing parallelism on
> > > partitionned table, v5 attached removes the check on the estimated
> > > number of pages.
> > >
> > >
>
> Few comments:
> 1.
> + limited according to the <xref linkend="gux-max-parallel-degree">
>
> A. typo.
> /gux-max-parallel-degree/guc-max-parallel-degree
> /worker/workers
> B. + <para>
> + Number of workers wanted for this table. The number of worker will
> be
> + limited according to
> the <xref linkend="gux-max-parallel-degree">
> + parameter.
> + </para>
>
> How about writing the above as:
> Sets the degree of parallelism for an individual relation. The requested
> number of workers will be limited by <xref
> linkend="guc-max-parallel-degree">
>
> 2.
> + {
> + {
> + "parallel_degree",
> + "Number of parallel processes
> per executor node wanted for this relation.",
> + RELOPT_KIND_HEAP,
> +
> AccessExclusiveLock
> + },
> + -1, 1, INT_MAX
> + },
>
> I think here min and max values should be same as for max_parallel_degree
> (I have verified that for some of the other reloption parameters, min and
> max are same as their guc values); Is there a reason to keep them different?
>
> 3.
> @@ -1291,7 +1300,9 @@ default_reloptions(Datum reloptions, bool validate,
> relopt_kind kind)
>
> Comment on top of this function says:
> /*
> * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
> * autovacuum)
> */
>
> I think it is better to include parallel_degree in above comment along
> with fillfactor and autovacuum.
>
>
> 4.
> /*
> + * RelationGetMaxParallelDegree
> + * Returns the relation's parallel_degree. Note multiple eval of
> argument!
> + */
> +#define RelationGetParallelDegree(relation, defaultmpd) \
> + ((relation)->rd_options ? \
> +
> ((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultmpd))
> +
>
> There are minor in-consistencies in the above macro definition.
>
> a. RelationGetMaxParallelDegree - This should be RelationGetParallelDegree.
> b. defaultmpd - it is better to name it as defaultpd
>
>
> >
> >
> > The feature freeze is now very close. If this GUC is still wanted,
> > should I add this patch to the next commitfest?
> >
>
> I am hoping that this will be committed to 9.6, but I think it is good to
> register it in next CF.
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

I'm late to the party on this thread, and most of the discussion seems to
be about setting parallel levels based on tables, which I think is wise.

What I haven't seen is any talk about setting parallel degree relative to
how many CPUs exist on the machine. Clearly we don't need it right away,
but when we do, I'm happy to report that CPU discovery is as easy as

(int)sysconf(_SC_NPROCESSORS_ONLN)

source: https://github.com/moat/pmpp/blob/distribute_in_c/src/num_cpus.c an
extension I will be very happy to see declared obsolete.

But even that would probably be consulted only at system startup time, and
used to dynamically compute whatever GUCs and system settings will be used
until restart.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2016-04-04 14:14:22 Re: WIP: Covering + unique indexes.
Previous Message Fabrízio de Royes Mello 2016-04-04 13:53:13 Re: Sequence Access Method WIP