Re: New design for FK-based join selectivity estimation

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New design for FK-based join selectivity estimation
Date: 2016-06-13 18:52:06
Message-ID: CANP8+jLNturi9VDqaw4HtVxBEOGs-ssT8wk0opf_Dm1EqF9kig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 June 2016 at 19:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > So a simple change is to make RelationGetFKeyList() only retrieve FKs
> with
> > nKeys>1. Rename to RelationGetMultiColumnFKeyList(). That greatly reduces
> > the scope for increased planning time.
>
> FWIW, I don't particularly agree with that. It makes the relcache's fkey
> storage extremely specific to this one use-case, a decision I expect we'd
> regret later.

Hmm, clearly I thought that earlier also; that earlier thinking may be
influencing you. My commits had the concept of generic FK info and then a
specific optimization. So the main part of the planning problem was caused
by stored info that would never be used, in 9.6.

What changes my mind here is 1) point in dev cycle, 2) the point that the
list of FKs doesn't take into account whether the constraints are
deferrable, deferred or immediate and whether they are valid/invalid. ISTM
likely that we would care about those things in the future if we believe
that info is generic.

But then each new usage of the info will have the same planning time
problem to consider if they choose to extend the amount of info they hold.

Rejecting an optimization in 9.6 because it might be undone by later
changes is surely a problem for those later changes to resolve.

> And the planner needs to filter the fkey list anyway,
> because it only wants fkeys that link to tables that are also in the
> current query. Thus, my recommendation was that we should allow
> RelationGetFKeyList to return a pointer directly to the cached info list
> and require the planner to immediately copy (only) the entries that it
> needs for the current query.
>
> Another point here is that I'm now unconvinced that restricting the logic
> to consider only multi-column fkeys is really what we want. It looks to
> me like the code can also improve estimates in the case where there are
> multiple single-column FKs linking to the same target relation. That
> might not be too common for two-table queries, but I bet it happens a
> lot in three-or-more-table queries.
>

Is it realistic that we consider that at this point? Certainly not for
myself, at least.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Borodin 2016-06-13 18:58:30 Re: [PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Previous Message Tom Lane 2016-06-13 18:16:25 Re: New design for FK-based join selectivity estimation