Parallel stats in execution plans

From: David Conlin <dc345(at)cantab(dot)net>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Parallel stats in execution plans
Date: 2019-01-16 11:31:08
Message-ID: 58120fc3-968e-4eb9-040f-aeaca9b0c9d5@cantab.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi folks -

I'm having trouble understanding what some of the stats mean in the
execution plan output when parallel workers are used. I've tried to read
up about it, but I haven't been able to find anything that explains what
I'm seeing. Apologies in advance if there's documentation I've been too
stupid to find.

I've run the following query. The "towns" table is a massive table that
I created in order to get some big numbers on a parallel query - don't
worry, this isn't a real query I want to make faster, just a silly
example I'd like to understand.

EXPLAIN (ANALYZE, FORMAT JSON, BUFFERS, VERBOSE)
SELECT name, code, article
FROM towns
ORDER BY nameASC,
codeDESC;

The output looks like this:

[
{
"Plan": {
"Node Type": "Gather Merge", "Parallel Aware": false, "Startup Cost":
1013948.54, "Total Cost": 1986244.55, "Plan Rows": 8333384, "Plan
Width": 77, "Actual Startup Time": 42978.838, "Actual Total Time":
60628.982, "Actual Rows": 10000010, "Actual Loops": 1, "Output": ["name", "code", "article"], "Workers Planned": 2, "Workers Launched": 2, "Shared Hit Blocks": 29,
"Shared Read Blocks": 47641, "Shared Dirtied Blocks": 0, "Shared Written
Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local
Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks":
91342, "Temp Written Blocks": 91479, "Plans": [
{
"Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware":
false, "Startup Cost": 1012948.52, "Total Cost": 1023365.25, "Plan
Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 42765.496,
"Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual Loops":
3, "Output": ["name", "code", "article"], "Sort Key": ["towns.name", "towns.code DESC"], "Sort Method": "external merge", "Sort Space Used": 283856, "Sort
Space Type": "Disk", "Shared Hit Blocks": 170, "Shared Read Blocks":
142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local
Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0,
"Local Written Blocks": 0, "Temp Read Blocks": 273289, "Temp Written
Blocks": 273700, "Workers": [
{
"Worker Number": 0, "Actual Startup Time": 42588.662, "Actual Total
Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit
Blocks": 72, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 89067, "Temp Written Blocks": 89202 }, {
"Worker Number": 1, "Actual Startup Time": 42946.705, "Actual Total
Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit
Blocks": 69, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 92880, "Temp Written Blocks": 93019 }
], "Plans": [
{
"Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel
Aware": true, "Relation Name": "towns", "Schema": "public", "Alias":
"towns", "Startup Cost": 0.00, "Total Cost": 184524.92, "Plan Rows":
4166692, "Plan Width": 77, "Actual Startup Time": 0.322, "Actual Total
Time": 8305.886, "Actual Rows": 3333337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Shared Hit Blocks": 96, "Shared Read Blocks": 142762, "Shared Dirtied
Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local
Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
"Temp Read Blocks": 0, "Temp Written Blocks": 0, "Workers": [
{
"Worker Number": 0, "Actual Startup Time": 0.105, "Actual Total Time":
8394.629, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit
Blocks": 35, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 0, "Temp Written Blocks": 0 }, {
"Worker Number": 1, "Actual Startup Time": 0.113, "Actual Total Time":
8139.382, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit
Blocks": 32, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 0, "Temp Written Blocks": 0 }
]
}
]
}
]
}, "Planning Time": 22.898, "Triggers": [
], "Execution Time": 61133.161 }
]

Or a more slimmed-down version, with just the confusing fields:

[
{
"Plan": {
"Node Type": "Gather Merge", "Parallel Aware": false,"Actual Total
Time": 60628.982, "Actual Rows": 10000010, "Actual Loops": 1,"Workers
Planned": 2, "Workers Launched": 2,"Plans": [
{
"Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware":
false,"Actual Total Time": 48526.168, "Actual Rows": 3333337, "Actual
Loops": 3,"Workers": [
{
"Worker Number": 0,"Actual Total Time": 48456.662, "Actual Rows":
3277980, "Actual Loops": 1}, {
"Worker Number": 1,"Actual Total Time": 48799.414, "Actual Rows":
3385130, "Actual Loops": 1}
], "Plans": [
{
"Node Type": "Seq Scan","Parallel Aware": true,"Actual Total Time":
8305.886, "Actual Rows": 3333337, "Actual Loops": 3,"Workers": [
{
"Worker Number": 0,"Actual Total Time": 8394.629, "Actual Rows":
3277980, "Actual Loops": 1}, {
"Worker Number": 1,"Actual Total Time": 8139.382, "Actual Rows":
3385130, "Actual Loops": 1}
]
}
]
}
]
},"Execution Time": 61133.161 }
]

The things I'm struggling to understand are:

* How the time values combine with parallelism. For example, each
execution of the sort node takes an average of 48.5s, over three
loops. This makes a total running time of 145.5s. Even if this was
perfectly distributed between the two workers, I would expect this
to take 72.75s, which is more than the total execution time, so it
can't take this long.
*  How the row numbers combine with those in the "Workers" subkey.
For example, in the Sort node, worker #0 produces 3,277,980 rows,
while worker #1 produces 3,385,130 rows. The Sort node as a whole
produces 3,333,337 rows per loop, for a total of 10,000,010 (the
value in the gather merge node). I would have expected the number of
rows produced by the two workers to sum to the number produced by
the Sort node as a whole, either per loop or in total.
* How the "Actual Loops" values combine with those in the "Workers"
subkey. For example, the "Sort" node has 3 loops, but each of the
workers inside it have 1 loop. I would have expected either:
o each of the workers to have done 3 loops (since the sort is
executed 3 times), or
o the number of loops in the two workers to sum to three (if the
three executions of the sort are distributed across the two workers)

Other info about my setup:

* Postgres version: "PostgreSQL 10.4, compiled by Visual C++ build
1800, 64-bit"
* I installed Postgres using the EnterpriseDB one-click installer
* OS: Windows 10, v1803
* I'm using Jetbrains Datagrip to connect to Postgres
* No errors are logged.
* Altered config settings:
*

application_name PostgreSQL JDBC Driver session
client_encoding UTF8 client
DateStyle ISO, MDY client
default_text_search_config pg_catalog.english configuration file
dynamic_shared_memory_type windows configuration file
extra_float_digits 3 session
lc_messages English_United States.1252 configuration file
lc_monetary English_United States.1252 configuration file
lc_numeric English_United States.1252 configuration file
lc_time English_United States.1252 configuration file
listen_addresses * configuration file
log_destination stderr configuration file
logging_collector on configuration file
max_connections 100 configuration file
max_stack_depth 2MB environment variable
port 5432 configuration file
shared_buffers 128MB configuration file
TimeZone UTC client

Thanks in advance, I'm sure I've done something silly or misunderstood
something obvious but I can't work out what it is for the life of me.

Dave

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laurenz Albe 2019-01-17 09:49:37 Re: No matching tables have ever been vacuumed
Previous Message Etsuro Fujita 2019-01-16 11:21:32 Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0