|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|Cc:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: distinct estimate of a hard-coded VALUES list|
|Views:||Raw Message | Whole Thread | Download mbox|
Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On 08/22/2016 07:42 PM, Alvaro Herrera wrote:
>> Also, if we patch it this way and somebody has a slow query because of a
>> lot of duplicate values, it's easy to solve the problem by
>> de-duplicating. But with the current code, people that have the
>> opposite problem has no way to work around it.
> I certainly agree it's better when a smart user can fix his query plan
> by deduplicating the values than when we end up generating a poor plan
> due to assuming some users are somewhat dumb.
Well, that seems to be the majority opinion, so attached is a patch
to do it. Do we want to sneak this into 9.6, or wait?
> I wonder how expensive would it be to actually count the number of
> distinct values - there certainly are complex data types where the
> comparisons are fairly expensive, but I would not expect those to be
> used in explicit VALUES lists.
You'd have to sort the values before de-duping, and deal with VALUES
expressions that aren't simple literals. And you'd have to do it a
lot --- by my count, get_variable_numdistinct is invoked six times
on the Var representing the VALUES output in Jeff's example query.
Maybe you could cache the results somewhere, but that adds even
more complication. Given the lack of prior complaints about this,
it's hard to believe it's worth the trouble.
regards, tom lane
|Next Message||Tom Lane||2016-08-23 12:32:01||Re: Block level parallel vacuum WIP|
|Previous Message||Kevin Grittner||2016-08-23 12:26:31||Re: Logical decoding of sequence advances, part II|