Sort push down through a nested loop, for 9.7

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Sort push down through a nested loop, for 9.7
Date: 2016-04-09 20:01:28
Message-ID: CAMkU=1zg3nFP8EJ4PRi5K+NyzCD-QCthXNHoru8U+0zTK9nQbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was testing to see if the newer changes in 9.6 fixed some planning
issues I've seen in prior versions. It does not, for the ones of
interest to me, but while looking into it I see what seems to be a
missing optimization (in all versions).

If the first child of a nested loop produces naturally ordered output
(from an index), the planner is smart enough to realize that this
ordering passes up and applies to the entire nested loop node:

explain select * from pgbench_accounts a join pgbench_accounts b using
(bid) order by a.aid;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.29..150007137.29 rows=10000000000 width=194)
Join Filter: (a.bid = b.bid)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts a
(cost=0.29..4247.29 rows=100000 width=97)
-> Materialize (cost=0.00..3140.00 rows=100000 width=97)
-> Seq Scan on pgbench_accounts b (cost=0.00..2640.00
rows=100000 width=97)

But if a naturally ordering index is not available, it is not smart
enough to push the sort down through the nested loop to the first
child:

set enable_hashjoin TO off;

explain select * from pgbench_accounts a join pgbench_accounts b using
(bid) order by a.filler;
QUERY PLAN
---------------------------------------------------------------------------------------------
Sort (cost=5707453952.44..5732453952.44 rows=10000000000 width=275)
Sort Key: a.filler
-> Nested Loop (cost=0.00..150005530.00 rows=10000000000 width=275)
Join Filter: (a.bid = b.bid)
-> Seq Scan on pgbench_accounts a (cost=0.00..2640.00
rows=100000 width=97)
-> Materialize (cost=0.00..3140.00 rows=100000 width=97)
-> Seq Scan on pgbench_accounts b (cost=0.00..2640.00
rows=100000 width=97)

Sorting 100000 rows would be a lot faster than sorting 10000000000 rows.

Is this one of the cases where the CPU cycles needed to detect the
opportunity during planning are just not worthwhile, considering how
rarely the opportunity arises?

Cheers,

Jeff

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-04-09 20:37:42 Re: proposal: PL/Pythonu - function ereport
Previous Message Andres Freund 2016-04-09 19:49:29 Re: Move PinBuffer and UnpinBuffer to atomics