Re: Using indices for UNION.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: peter_e(at)gmx(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Using indices for UNION.
Date: 2013-11-19 10:24:15
Message-ID: 20131119.192415.119467380.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for looking in detail for this patch, and giving
thoughtful advices.

> I'm aware that you said you were going to refactor this, but I took a
> quick look through it anyway. I don't think mere refactoring is going
> to make me happy with it :-(.

Ouch.. Anyway I've refactored this patch altogether with the
another 'unique index' patch and re-splitted into three
parts. One is the 'unique-index stuff' and the second is 'Adding
pathkeys and unique info into struct Plan' patch and the third is
'maily-in-prepunion stuff'. They will be sent in other messages
although for the scarce chance to be picked up :-)

> The basic problem here, as well as with some of the hackery that's
> been proposed recently in planner.c, is that we can't really get much
> further with improving this layer of the planner until we bite the
> bullet and convert this layer to work with Paths not finished Plans.

It is right but hard to go. I would be able to put my little
power on the issue but the planner hardly seems to be get
refactored gradually.

> Look at what you're proposing here: do a complete planning cycle on
> the subquery (underneath which both ordered and unordered Paths will
> be considered, and one or the other will get thrown away). Then do
> *another* complete planning cycle with ordered output forced. Then
> compare costs of the results.

Exactly.

> This is hugely inefficient, and it produces far-from-ideal
> results too, because you're forced to make a decision right
> there on which subquery plan to use; you can't make the
> globally optimal choice after considering costs of all the
> subqueries.

Umm. Yes, I know I've done it in somewhat brute way and it costs
a little too much if the subqueries of UNION is rather large, but
I suppose it comparably small to the whole execution time for
most cases.

Putting it aside, from the viewpoint of global-optimizations,
current planner also doesn't do such optimizations. Moreover it
doesn't consider sorted plans possibly available and
effective. Ignoring the additional planning cost (:-), for the
UNION queries with LIMITS on large partitioned tables with
apropriate indexes (mmm. I suppose it is not so narrow gate..),
the query runtime should be extremely shortend.

> What we need to do is fix things so we can get multiple Paths
> out of the subquery planning step, some ordered and some not.
> Then we could construct Paths for both the brute force and
> merge-append styles of computing the UNION result, and finally
> choose the cheapest Path at the top level.

Agreed with it at a glance. It sounds quite reasonable. I've been
a bit worried about that the paths and plans seem to be fused in
some extent in grouping_planner, and prepunion
(plan_set_operations) is jumping over path-generation phase to
make plans directly. (And about that I pushed it more on the
way..)

> The same goes for hacking around in grouping_planner. That function
> is well past the point of collapsing of its own weight; we need
> to invest some effort in refactoring before we can afford to add
> much more complexity there. And I think the logical way to refactor
> is to rewrite the code in a Path-generation-and-comparison style.
> (Actually, grouping_planner would need to be fixed first, since
> that's what would have to produce the Paths we're hoping to compare
> in prepunion.)

It seems necessary but also seems far far away and hard to
go.

> I had hoped to make some progress on that rewrite this past summer,
> but ended up with no time to work on it :-(. There's going to be
> a lot of boring infrastructure work before we see much in the way
> of user-visible improvement, I'm afraid, so it's kind of hard to
> make time for it.

Anyway, because I've happened to pass too close by the issue, I
will consider on that for some time (although I would hardly come
up with spiffy solution in a short time.)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-11-19 10:26:16 Re: Call flow of btinsert(PG_FUNCTION_ARGS)
Previous Message Karsten Hilbert 2013-11-19 10:22:47 pg_upgrade ?deficiency