Re: A reloption for partitioned tables - parallel_workers

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Seamus Abshere <seamus(at)abshere(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A reloption for partitioned tables - parallel_workers
Date: 2021-02-16 06:05:50
Message-ID: CA+HiwqFHZ6Zyt5W7Ld-+6joBs1ii9Q1w_u5eNOc9=0m_6jeXAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Seamus,

Please see my reply below.

On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <seamus(at)abshere(dot)net> wrote:
> On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:
> > On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <seamus(at)abshere(dot)net> wrote:
> > > It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > >
> > > Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:
>
> > You may see by inspecting the callers of compute_parallel_worker()
> > that it never gets called on a partitioned table, only its leaf
> > partitions. Maybe you could try calling compute_parallel_worker()
> > somewhere in add_paths_to_append_rel(), which has this code to figure
> > out parallel_workers to use for a parallel Append path for a given
> > partitioned table:
> >
> > /* Find the highest number of workers requested for any subpath. */
> > foreach(lc, partial_subpaths)
> > {
> > Path *path = lfirst(lc);
> >
> > parallel_workers = Max(parallel_workers, path->parallel_workers);
> > }
> > Assert(parallel_workers > 0);
> >
> > /*
> > * If the use of parallel append is permitted, always request at least
> > * log2(# of children) workers. We assume it can be useful to have
> > * extra workers in this case because they will be spread out across
> > * the children. The precise formula is just a guess, but we don't
> > * want to end up with a radically different answer for a table with N
> > * partitions vs. an unpartitioned table with the same data, so the
> > * use of some kind of log-scaling here seems to make some sense.
> > */
> > if (enable_parallel_append)
> > {
> > parallel_workers = Max(parallel_workers,
> > fls(list_length(live_childrels)));
> > parallel_workers = Min(parallel_workers,
> > max_parallel_workers_per_gather);
> > }
> > Assert(parallel_workers > 0);
> >
> > /* Generate a partial append path. */
> > appendpath = create_append_path(root, rel, NIL, partial_subpaths,
> > NIL, NULL, parallel_workers,
> > enable_parallel_append,
> > -1);
> >
> > Note that the 'rel' in this code refers to the partitioned table for
> > which an Append path is being considered, so compute_parallel_worker()
> > using that 'rel' would use the partitioned table's
> > rel_parallel_workers as you are trying to do.
>
> Here we go, my first patch... solves https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com

Thanks for sending the patch here.

It seems you haven't made enough changes for reloptions code to
recognize parallel_workers as valid for partitioned tables, because
even with the patch applied, I get this:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp set (parallel_workers = 1);
ERROR: unrecognized parameter "parallel_workers"

You need this:

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index 029a73325e..9eb8a0c10d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
{
"parallel_workers",
"Number of parallel processes that can be used per
executor node for this relation.",
- RELOPT_KIND_HEAP,
+ RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
ShareUpdateExclusiveLock
},
-1, 0, 1024

which tells reloptions parsing code that parallel_workers is
acceptable for both heap and partitioned relations.

Other comments on the patch:

* This function calling style, where the first argument is not placed
on the same line as the function itself, is not very common in
Postgres:

+ /* First see if there is a root-level setting for parallel_workers */
+ parallel_workers = compute_parallel_worker(
+ rel,
+ -1,
+ -1,
+ max_parallel_workers_per_gather
+

This makes the new code look very different from the rest of the
codebase. Better to stick to existing styles.

2. It might be a good idea to use this opportunity to add a function,
say compute_append_parallel_workers(), for the code that does what the
function name says. Then the patch will simply add the new
compute_parallel_worker() call at the top of that function.

3. I think we should consider the Append parent relation's
parallel_workers ONLY if it is a partitioned relation, because it
doesn't make a lot of sense for other types of parent relations. So
the new code should look like this:

if (IS_PARTITIONED_REL(rel))
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

4. Maybe it also doesn't make sense to consider the parent relation's
parallel_workers if Parallel Append is disabled
(enable_parallel_append = off). That's because with a simple
(non-parallel) Append running under Gather, all launched parallel
workers process the same partition before moving to the next one.
OTOH, one's intention of setting parallel_workers on the parent
partitioned table would most likely be to distribute workers across
partitions, which is only possible with parallel Append
(enable_parallel_append = on). So, the new code should look like
this:

if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

BTW, please consider bottom-posting like I've done in this reply,
because that makes it easier to follow discussions involving patch
reviews that can potentially take many emails to reach conclusions.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2021-02-16 06:29:30 libpq PQresultErrorMessage vs PQerrorMessage API issue
Previous Message Greg Nancarrow 2021-02-16 05:48:37 Re: Libpq support to connect to standby server as priority