Re: Planner question - "bit" data types

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Karl Denninger <karl(at)denninger(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner question - "bit" data types
Date: 2009-09-08 01:12:54
Message-ID: 603c8f070909071812h2b2f6c47o1c432212189ed4c7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Sep 7, 2009 at 8:51 PM, Karl Denninger<karl(at)denninger(dot)net> wrote:
> Robert Haas wrote:
>
> On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl(at)denninger(dot)net> wrote:
>
>
> There was a previous thread and I referenced it. I don't have the other one
> in my email system any more to follow up to it.
>
> I give up; the attack-dog crowd has successfully driven me off.  Ciao.
>
>
> Perhaps I'm biased by knowing some of the people involved, but I don't
> think anyone on this thread has been anything but polite. It would
> certainly be great if PostgreSQL could properly estimate the
> selectivity of expressions like this without resorting to nasty hacks,
> but it can't, and unfortunately, there's really no possibility of that
> changing any time soon. Even if someone implements a fix today, the
> soonest it will appear in a production release is June 2010. So, any
> suggestion for improvement is going to be in the form of suggesting
> that you modify the schema in some way. I know that's not really what
> you're looking for, but unfortunately it's the best we can do.
>
> As far as I can tell, it is not correct to say that you referenced the
> previous thread. I do not see any such reference.
>
> ...Robert
>
>
>
> I was asking about modifying the schema.
>
> The current schema is an integer being used as a bitmask.  If the planner
> knows how to handle a type of "bit(X)" (and will at least FILTER rather than
> NESTED LOOP it on a select, as happens for an Integer used in this fashion),
> that change is easier than splitting it into individual boolean fields.

Well, the first several replies seem to address that question - I
think we all agree that won't help. I'm not sure what you mean by "at
least FILTER rather than NESTED LOOP it on a select". However,
typically, the time when you get a nested loop is when the planner
believes that the loop will be executed very few times (in other
words, the outer side will return very few rows). It probably isn't
the case that the planner COULDN'T choose to execute the query in some
other way; rather, the planner believes that the nested loop is faster
because of a (mistaken) belief about how many rows the
bitmap-criterion will actually match. All the suggestions you've
gotten upthread are tricks to enable the planner to make a better
estimate, which will hopefully cause it to choose a better plan.

As a general statement, selectivity estimation problems are very
painful to work around and often involve substantial application
redesign. In all honesty, I think you've run across one of the easier
variants. As painful as it is to hear the word easy applied to a
problem that's killing you, there actually IS a good solution to this
problem: use individual boolean fields. I know that's not what you
want to do, but it's better than "sorry, you're hosed, no matter how
you do this it ain't gonna work". And I do think there are a few in
the archives that fall into that category.

Good luck, and sorry for the bad news.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brian Cox 2009-09-08 01:22:37 Re: Planner question - "bit" data types
Previous Message Karl Denninger 2009-09-08 00:51:43 Re: Planner question - "bit" data types