Re: pgsql: Support partition pruning at execution time

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Support partition pruning at execution time
Date: 2018-04-10 18:32:47
Message-ID: 20180410183247.ioo3ijpwfbbbq26u@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Robert Haas wrote:
>
> I don't have a strong opinion on that. Since we know how many tuples
> were processed by each worker, knowing how many heap fetches we have
> on a per-worker basis seems like a good thing to have, too. On the
> other hand, maybe EXPLAIN (ANALYZE, VERBOSE) would give us that
> anyway, since it knows about displaying per-worker instrumentation
> (look for /* Show worker detail */). If it doesn't, then that's
> probably bad, because I'm pretty sure that when I installed that code
> the stuff that got displayed for worker instrumentation pretty much
> matched the stuff that got displayed for overall instrumentation.

So, after some experimentation I was able to produce this output with
the previously proposed patch:
QUERY PLAN

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gather Merge (cost=1000.51..27083147.02 rows=22950230 width=12) (actual time=1198.906..1198.906 rows=0 loops=1)
Output: tbl1.col1, tprt_1.col1, tprt_1.col1
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=267516
-> Nested Loop (cost=0.43..24318369.79 rows=4590046 width=12) (actual time=1183.291..1183.291 rows=0 loops=6)
Output: tbl1.col1, tprt_1.col1, tprt_1.col1
Join Filter: (tbl1.col1 = tprt_1.col1)
Rows Removed by Join Filter: 1200012
Buffers: shared hit=1836184
Worker 0: actual time=1174.526..1174.526 rows=0 loops=1
Buffers: shared hit=245599
Worker 1: actual time=1178.845..1178.845 rows=0 loops=1
Buffers: shared hit=270694
Worker 2: actual time=1185.703..1185.703 rows=0 loops=1
Buffers: shared hit=313607
Worker 3: actual time=1189.595..1189.595 rows=0 loops=1
Buffers: shared hit=383714
Worker 4: actual time=1176.965..1176.965 rows=0 loops=1
Buffers: shared hit=355054
-> Parallel Index Only Scan using tprt1_idx on public.tprt_1 (cost=0.43..63100.29 rows=360004 width=4) (actual time=0.041..209.113 rows=300003 loops=6)
Output: tprt_1.col1
Heap Fetches: 1800018
Buffers: shared hit=36166
Worker 0: actual time=0.035..192.255 rows=240750 loops=1
Buffers: shared hit=4849
Worker 1: actual time=0.031..184.332 rows=265307 loops=1
Buffers: shared hit=5387
Worker 2: actual time=0.046..272.412 rows=307509 loops=1
Buffers: shared hit=6098
Worker 3: actual time=0.035..212.866 rows=376064 loops=1
Buffers: shared hit=7650
Worker 4: actual time=0.036..201.076 rows=348048 loops=1
Buffers: shared hit=7006
-> Seq Scan on public.tbl1 (cost=0.00..35.50 rows=2550 width=4) (actual time=0.001..0.002 rows=4 loo
Time: 1205,533 ms (00:01,206)

You're right that per-worker heap-fetches would be probably better, but
that seems like a project larger than I'm willing to tackle at this
point in the devel cycle -- mainly because that "worker detail" does not
have node-specific handling, so there's a bunch of new code to be
written there. This amount of detail seems about right for me: I'm not
sure that you really care all that much about how many fetches each
worker had to do, but rather you care about how many there were in the
table as a whole.

Another query (without verbose) gives this:

QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gather (cost=20000001002.10..20249583898.07 rows=80390828 width=12) (actual time=1554.752..6633.272 rows=3600036 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Nested Loop (cost=20000000002.10..20241543815.27 rows=80390828 width=12) (actual time=1547.552..5832.355 rows=3600036 loops=1)
Join Filter: (tbl1.col1 = tprt_1.col1)
Rows Removed by Join Filter: 21600216
Buffers: shared hit=75688 read=54346
-> Merge Append (cost=2.10..371288.65 rows=6305163 width=4) (actual time=0.758..1980.024 rows=6300063 loops=1)
Sort Key: tprt_1.col1
Buffers: shared hit=75688 read=54345
-> Index Only Scan using tprt1_idx on tprt_1 (cost=0.43..77500.44 rows=1800018 width=4) (actual time=0.211..387.916 rows=1800018 loops=1)
Heap Fetches: 1800018
Buffers: shared hit=22028 read=14035
-> Index Only Scan using tprt2_idx on tprt_2 (cost=0.43..121147.34 rows=2700027 width=4) (actual time=0.180..628.350 rows=2700027 loops=1)
Heap Fetches: 2700027
Buffers: shared hit=40597 read=26251
-> Index Only Scan using tprt3_idx on tprt_3 (cost=0.42..29722.56 rows=900009 width=4) (actual time=0.153..184.429 rows=900009 loops=1)
Heap Fetches: 900009
Buffers: shared hit=6538 read=7029
-> Index Only Scan using tprt4_idx on tprt_4 (cost=0.15..82.41 rows=2550 width=4) (actual time=0.023..0.023 rows=0 loops=1)
Heap Fetches: 0
Buffers: shared hit=1
-> Index Only Scan using tprt5_idx on tprt_5 (cost=0.15..82.41 rows=2550 width=4) (actual time=0.018..0.018 rows=0 loops=1)
Heap Fetches: 0
Buffers: shared hit=1
-> Index Only Scan using tprt6_idx on tprt_6 (cost=0.42..29734.56 rows=900009 width=4) (actual time=0.165..191.343 rows=900009 loops=1)
Heap Fetches: 900009
Buffers: shared hit=6523 read=7030
-> Materialize (cost=10000000000.00..10000000048.25 rows=2550 width=4) (actual time=0.000..0.000 rows=4 loops=6300063)
Buffers: shared read=1
-> Seq Scan on tbl1 (cost=10000000000.00..10000000035.50 rows=2550 width=4) (actual time=0.033..0.036 rows=4 loops=1)
Buffers: shared read=1
Planning Time: 4.344 ms
Execution Time: 7008.095 ms
(35 filas)

Looks alright to me.

> In any case part of the point is that Instrumentation is supposed to
> be a generic structure that contains things that are for the most part
> common to all node types. So what MERGE did to that structure looks
> like in inadvisable kludge to me. I'd get rid of that and do it a
> different way rather than propagate the idea that nfilteredX is
> scratch space that can mean something different in every separate
> node.

Agreed. My proposed patch adds another generic counter "tuples2", which
can be used at least by ON CONFLICT.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-10 18:57:26 pgsql: Fix IndexOnlyScan counter for heap fetches in parallel mode
Previous Message Tom Lane 2018-04-10 17:47:43 Re: pgsql: Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2018-04-10 18:41:45 Re: 2018-03 CF Cleanup
Previous Message Robert Haas 2018-04-10 18:21:06 Re: Reopen logfile on SIGHUP