Skip site navigation (1) Skip section navigation (2)

Re: Bad estimation for "where field not in"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad estimation for "where field not in"
Date: 2012-03-01 21:18:00
Message-ID: 17820.1330636680@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-performance
Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> writes:
> On Thu, Mar 1, 2012 at 6:40 PM, Daniele Varrazzo
> <daniele(dot)varrazzo(at)gmail(dot)com> wrote:
>> Is this a known planner shortcoming or something unexpected, to be
>> escalated to -bugs? Server version is 9.0.1.

> The relevant code is in scalararraysel() function. It makes the
> assumption that element wise comparisons are completely independent,
> while the exact opposite is true. This has been this way since
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=290166f93404d8759f4bf60ef1732c8ba9a52785
> introduced it to version 8.2.

> At least for equality and inequality ops it would be good to rework
> the logic to aggregate with
> s1 = s1 + s2 and s1 = s1 + s2 - 1 correspondingly.

Yeah, I was about to make a similar proposal.  In principle, when
working with a constant array, we could de-dup the array elements
and then arrive at an exact result ... but that seems like expensive
overkill, and in particular it'd be penalizing intelligently-written
queries (which wouldn't have dups in the array to start with) to benefit
badly-written ones.  So it seems like the right thing is for
scalararraysel to (1) check if the operator is equality or inequality,
and if so (2) just assume the array elements are all different and so
the probabilities sum directly.  If the operator is something else
it's probably best to stick with the existing logic.  We could probably
also protect ourselves a bit more by noting if the sum gives an
impossible result (probability > 1 or < 0) and falling back to the
normal calculation in that case.

			regards, tom lane

In response to

pgsql-performance by date

Next:From: Tom LaneDate: 2012-03-01 21:29:24
Subject: Re: [planner] Ignore "order by" in subselect if parrent do count(*)
Previous:From: Marti RaudseppDate: 2012-03-01 20:51:52
Subject: Re: Large insert and delete batches

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group