Partial path row estimates

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Partial path row estimates
Date: 2019-11-16 01:02:01
Message-ID: CA+hUKG+rwqQH=E5+SQ2PZO4pWSU22pVaceqMJPrdPJmqjAdyaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I don't like this:

-> Parallel Hash (... rows=416667 ...) (... rows=333333 ...)

I think the logic in get_parallel_divisor() only makes sense for
queries like this (output with nearby patch to show leader
contribution):

postgres=# set parallel_tuple_cost = 0;
SET
postgres=# set parallel_setup_cost = 0;
SET
postgres=# explain (analyze, verbose) select * from t;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..8591.67 rows=1000000 width=4) (actual
time=0.460..411.944 rows=1000000 loops=1)
Output: i
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on public.t (cost=0.00..8591.67 rows=416667
width=4) (actual time=0.066..136.777 rows=333333 loops=3)
Output: i
Leader: actual time=0.022..8.061 rows=30058 loops=1
<--- poor contribution
Worker 0: actual time=0.150..201.007 rows=502574 loops=1
Worker 1: actual time=0.027..201.263 rows=467368 loops=1
Planning Time: 0.071 ms
Execution Time: 495.700 ms
(11 rows)

For anything that consumes its entire input before emitting a tuple,
for example Partial Aggregate, Sort and Hash, it doesn't make sense to
mangle the child paths' row estimates. For example:

-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4)
(actual time=263.402..263.403 rows=333333 loops=3)
Output: t2.i
Buckets: 131072 Batches: 16 Memory Usage: 3520kB
Leader: actual time=266.861..266.861 rows=341888 loops=1
<--- equal contribution
Worker 0: actual time=261.521..261.522 rows=322276 loops=1
Worker 1: actual time=261.824..261.825 rows=335836 loops=1

get_parallel_divisor() effectively assumes that every tuple emitted by
the path will cause a tuple to reach the Gather node, at which point
the gather distraction quotient needs to be estimated, so it does that
up front. Whether that really happens depends on where the path
finishes up being used.

I wonder if it would be better to get rid of that logic completely,
and instead tweak the gather path's run cost to account for the
processing asymmetry. In cases where the leader contributes very
little, you could argue that it's not OK to ignore leader distraction
in child paths (and therefore use avg(cardinality), not
max(cardinality) over all processes in, say, a nestloop), but on the
other hand, for cases that use if for something important like
estimating memory consumption (hash, sort, agg) that's exactly what
you want anyway because they're greedy.

With this approach, I suspect gather merge doesn't need to do anything
at all, because there we expect the leader to be forced to contribute
equally by the ordering requirement.

Here's an experimental patch to show what I mean. Not tested much,
just trying out the idea.

Attachment Content-Type Size
tweak-partial-cardinality.patch application/octet-stream 2.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-11-16 01:02:09 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Previous Message vignesh C 2019-11-16 00:10:30 Re: dropdb --force