Re: WIP: Upper planner pathification

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: WIP: Upper planner pathification
Date: 2016-03-01 14:30:26
Message-ID: 8783.1456842626@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> I tried to look into patch and I had a question (one for now): why LimitPath
> doesn't contain actual limit/offset value? I saw a lot of subqueries with LIMIT
> 1 which could be transformed into EXISTS subquery.

Oh, yeah, I intended to change that but didn't get to it yet. Consider
it done.

> Me too. I applied the patch and can confirm that 'make test' doesn't fail on
> FreeBSD 10.2. Now I will try to run kind of TPC-H with and without patch.

I do not think the patch will make a lot of performance difference as-is;
its value is more in what it will let us do later. There are a couple of
regression test cases that change plans for the better, but it's sort of
accidental. Those cases look like

select d.* from d left join (select * from b group by b.id, b.c_id) s
on d.a = s.id;

and what happens in HEAD is that the subquery chooses a hashagg plan
and then the upper query decides a mergejoin would be a good idea ...
so it has to sort the output of the hashagg. With the patch, what
comes back from the subquery is a Path for the hashagg and a Path
for doing the GROUP BY with Sort/Uniq. The second path is more expensive,
but it survives the add_path tournament because it can produce sorted
output. Then the outer level discovers that it can use that to do its
mergejoin without a separate sort step, and that way is cheaper overall.
So instead of

! -> Sort
! Sort Key: s.id
! -> Subquery Scan on s
! -> HashAggregate
! Group Key: b.id
! -> Seq Scan on b

we get

! -> Group
! Group Key: b.id
! -> Index Scan using b_pkey on b

which is noticeably cheaper, and not just because we got rid of the
Subquery Scan node. So that's nice --- but it's more or less accidental,
because the outer level isn't telling the inner level that this sort order
might be interesting.

Once this infrastructure is in place, I want to look at passing down more
information to recursive subquery_planner calls so that we're not leaving
this kind of optimization to chance. But the patch is big enough already,
so that (and a lot of other things) are getting left for later.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2016-03-01 14:37:28 Re: Publish autovacuum informations
Previous Message Craig Ringer 2016-03-01 14:28:40 Re: TAP / recovery-test fs-level backups, psql enhancements etc