Quick Links

Re: Improve selectivity estimate for range queries

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Yuzuko Hosoya" <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	"'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Improve selectivity estimate for range queries
Date:	2018-12-21 16:50:28
Message-ID:	28533.1545411028@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Yuzuko Hosoya" <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp> writes:
> From: Kyotaro HORIGUCHI [mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp]
>> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp> wrote in
>> <008701d4983d$02e731c0$08b59540$(at)lab(dot)ntt(dot)co(dot)jp>
>>> To handle such cases I've thought up of an idea based on a previous
>>> commit[1] which solved a similar problem related to
>>> DEFAULT_NUM_DISTINCT. Just like this modification, I added a flag
>>> which shows whether the selectivity

>> The commit [1] added almost no complexity, but this does. Affects many
>> functions to introduce tighter coupling between operator-selectivity
>> functions and clause selectivity functions. isdefault travells alone
>> too long just to bind remote functions. We might need more pricipled
>> way to handle that.

Yeah, I don't think that this approach is a reasonable tradeoff to
eliminate a narrow case. In particular, what's been done here to
clause_selectivity is totally unprincipled and poorly thought out:
you can't add an output parameter that's set in only some cases.
Yet, there's no good way to decide how to set it in many cases.
For instance, what would you do for an AND or OR where some of
the branches are default estimates and some not?

>> Around the [1], there was a discussion about introducing the notion of
>> uncertainty to selectivity. The isdefualt is a kind of uncertainty
>> value indicating '0/100% uncertain'. So my feeling is saying that
>> it's better that Selectivity has certinty component then building a
>> selectivity arithmetics involving uncertainty, though I don't have a
>> clear picture.

> I see. Indeed if we would change Selectivity so that it includes
> information about default/non-default, such problems I mentioned
> would be solved. But I'm afraid that this modification would lead
> to a big impact, since there are lots of functions that calculate
> selectivity. I also don't have a concreate idea, so I will think
> about it. Besides that, I'd like to ask community whether such
> modification would be acceptable or not.

The planner definitely has a lot of problems that could be addressed
with some sort of uncertainty framework ... but it'd be a huge
undertaking, which is why nobody's tried yet.

A smaller-footprint way to fix the immediate problem might be to
change the values of DEFAULT_INEQ_SEL and friends so that they're
even less likely to be matched by accident. That is, instead of
0.3333333333333333, use 0.333333333333342 or some such. It might
seem that that's just moving the problem around, but I think it
might be possible to show that such a value couldn't be computed
by scalarltsel given a histogram with no more than 10000 members.
(I haven't tried to actually prove that, but it seems intuitive
that the set of possible results would be quantized with no more
than about 5 digits precision.)

regards, tom lane

In response to

RE: Improve selectivity estimate for range queries at 2018-12-21 08:21:43 from Yuzuko Hosoya

Responses

Re: Improve selectivity estimate for range queries at 2019-01-08 07:26:38 from Kyotaro HORIGUCHI
Re: Improve selectivity estimate for range queries at 2019-01-10 20:48:39 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2018-12-21 16:50:33	Re: A few new options for vacuumdb
Previous Message	Robert Haas	2018-12-21 16:49:01	Re: A few new options for vacuumdb