Re: A reloption for partitioned tables - parallel_workers

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Seamus Abshere <seamus(at)abshere(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A reloption for partitioned tables - parallel_workers
Date: 2021-04-06 07:46:26
Message-ID: CA+HiwqHMmU=GwUvpWEScQJzDgSF9-voZco8C5ttX_BzYqEuB6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:
> > On Fri, 19 Mar 2021 at 02:07, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > > Attached a new version rebased over c8f78b616, with the grouping
> > > relation partitioning enhancements as a separate patch 0001. Sorry
> > > about the delay.
> >
> > I had a quick look at this and wondered if the partitioned table's
> > parallel workers shouldn't be limited to the sum of the parallel
> > workers of the Append's subpaths?
> >
> > It seems a bit weird to me that the following case requests 4 workers:
> >
> > # create table lp (a int) partition by list(a);
> > # create table lp1 partition of lp for values in(1);
> > # insert into lp select 1 from generate_series(1,10000000) x;
> > # alter table lp1 set (parallel_workers = 2);
> > # alter table lp set (parallel_workers = 4);
> > # set max_parallel_workers_per_Gather = 8;
> > # explain select count(*) from lp;
> > QUERY PLAN
> > -------------------------------------------------------------------------------------------
> > Finalize Aggregate (cost=97331.63..97331.64 rows=1 width=8)
> > -> Gather (cost=97331.21..97331.62 rows=4 width=8)
> > Workers Planned: 4
> > -> Partial Aggregate (cost=96331.21..96331.22 rows=1 width=8)
> > -> Parallel Seq Scan on lp1 lp (cost=0.00..85914.57
> > rows=4166657 width=0)
> > (5 rows)
> >
> > I can see a good argument that there should only be 2 workers here.
>
> Good point, I agree.
>
> > If someone sets the partitioned table's parallel_workers high so that
> > they get a large number of workers when no partitions are pruned
> > during planning, do they really want the same number of workers in
> > queries where a large number of partitions are pruned?
> >
> > This problem gets a bit more complex in generic plans where the
> > planner can't prune anything but run-time pruning prunes many
> > partitions. I'm not so sure what to do about that, but the problem
> > does exist today to a lesser extent with the current method of
> > determining the append parallel workers.
>
> Also a good point. That would require changing the actual number of
> parallel workers at execution time, but that is tricky.
> If we go with your suggestion above, we'd have to disambiguate if
> the number of workers is set because a partition is large enough
> to warrant a parallel scan (then it shouldn't be reduced if the executor
> prunes partitions) or if it is because of the number of partitions
> (then it should be reduced).

Maybe we really want a parallel_append_workers for partitioned tables,
instead of piggybacking on parallel_workers?

> I don't know if Seamus is still working on that; if not, we might
> mark it as "returned with feedback".

I have to agree given the time left.

> Perhaps Amit's patch 0001 should go in independently.

Perhaps, but maybe we should wait until something really needs that.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2021-04-06 08:11:03 Re: Idea: Avoid JOINs by using path expressions to follow FKs
Previous Message Amit Langote 2021-04-06 07:05:47 Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)