Re: distinct estimate of a hard-coded VALUES list

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: distinct estimate of a hard-coded VALUES list
Date: 2016-08-20 20:19:14
Message-ID: CAMkU=1wewMLQnzmr-6Y=jpgA0+zMws6DFySgoNx3JQ-_Z+cfng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> > So even though it knows that 6952 values have been shoved in the bottom,
> it
> > thinks only 200 are going to come out of the aggregation. This seems
> like
> > a really lousy estimate. In more complex queries than the example one
> > given it leads to poor planning choices.
>
> > Is the size of the input list not available to the planner at the point
> > where it estimates the distinct size of the input list? I'm assuming
> that
> > if it is available to EXPLAIN than it is available to the planner. Does
> it
> > know how large the input list is, but just throw up its hands and use 200
> > as the distinct size anyway?
>
> It does know it, what it doesn't know is how many duplicates there are.
>

Does it know whether the count comes from a parsed query-string list/array,
rather than being an estimate from something else? If it came from a join,
I can see why it would be dangerous to assume they are mostly distinct.
But if someone throws 6000 things into a query string and only 200 distinct
values among them, they have no one to blame but themselves when it makes
bad choices off of that.

Would it work to check vardata->rel->rtekind == RTE_VALUES
in get_variable_numdistinct to detect that?

> If we do what I think you're suggesting, which is assume the entries are
> all distinct, I'm afraid we'll just move the estimation problems somewhere
> else.
>

Any guesses as to where? (other than the case of someone doing something
silly with their query strings?)

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-20 20:58:25 Re: distinct estimate of a hard-coded VALUES list
Previous Message Tom Lane 2016-08-20 19:46:54 Re: Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)