Re: increased max_parallel_workers_per_gather results in fewer workers?

From: Sebastian Dressler <sebastian(at)swarm64(dot)com>
To: Philip Semanchuk <philip(at)americanefficient(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: increased max_parallel_workers_per_gather results in fewer workers?
Date: 2020-06-04 17:45:53
Message-ID: 5FFAB96B-056C-423F-A65A-81BA6C09C7EB@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Philip,

On 4. Jun 2020, at 18:41, Philip Semanchuk <philip(at)americanefficient(dot)com<mailto:philip(at)americanefficient(dot)com>> wrote:
[...]

Also, there are more configuration settings related to parallel queries you might want to look into. Most notably:

parallel_setup_cost
parallel_tuple_cost
min_parallel_table_scan_size

Especially the last one is a typical dealbreaker, you can try to set it to 0 for the beginning. Good starters for the others are 500 and 0.1 respectively.

Aha! By setting min_parallel_table_scan_size=0, Postgres uses the 6 workers I expect, and the execution time decreases nicely.

I posted a clumsily-anonymized plan for the “bad” scenario here --
https://gist.github.com/osvenskan/ea00aa71abaa9697ade0ab7c1f3b705b

There are 3 sort nodes in the plan. When I get the “bad” behavior, the sorts have one worker, when I get the good behavior, they have multiple workers (e.g. 6).

I also think, what Luis pointed out earlier might be a good option for you, i.e. setting

parallel_leader_participation = off;

And by the way, this 1 worker turns actually into 2 workers in total with leader participation enabled.

This brings up a couple of questions —
1) I’ve read that this is Postgres’ formula for the max # of workers it will consider for a table —

max_workers = log3(table size / min_parallel_table_scan_size)

Does that use the raw table size, or does the planner use statistics to estimate the size of the subset of the table that will be read before allocating workers?

"table size" is the number of PSQL pages, i.e. relation-size / 8 kB. This comes from statistics.

2) There are 9 tables in this query ranging in size from 72Kb to 17Gb. Does Postgres decide on a table-by-table basis to allocate multiple workers, or…?

AFAIK, it will do it per-table initially but then the final result depends on the chosen gather node.

Thank you so much for the suggestion, I feel un-stuck now that I have an idea to experiment with.

You are welcome, we are actually about to publish a blog post which has some more suggestions for parallelism in.

Cheers,
Sebastian

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Philip Semanchuk 2020-06-04 18:37:40 Re: increased max_parallel_workers_per_gather results in fewer workers?
Previous Message Philip Semanchuk 2020-06-04 16:41:35 Re: increased max_parallel_workers_per_gather results in fewer workers?