Re: Planner question - "bit" data types

From: Karl Denninger <karl(at)denninger(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner question - "bit" data types
Date: 2009-09-08 02:05:59
Message-ID: 4AA5BC07.7090806@denninger.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Robert Haas wrote:
> On Mon, Sep 7, 2009 at 8:51 PM, Karl Denninger<karl(at)denninger(dot)net> wrote:
>
>> Robert Haas wrote:
>>
>> On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl(at)denninger(dot)net> wrote:
>>
>>
>> There was a previous thread and I referenced it. I don't have the other one
>> in my email system any more to follow up to it.
>>
>> I give up; the attack-dog crowd has successfully driven me off. Ciao.
>>
>>
>> Perhaps I'm biased by knowing some of the people involved, but I don't
>> think anyone on this thread has been anything but polite. It would
>> certainly be great if PostgreSQL could properly estimate the
>> selectivity of expressions like this without resorting to nasty hacks,
>> but it can't, and unfortunately, there's really no possibility of that
>> changing any time soon. Even if someone implements a fix today, the
>> soonest it will appear in a production release is June 2010. So, any
>> suggestion for improvement is going to be in the form of suggesting
>> that you modify the schema in some way. I know that's not really what
>> you're looking for, but unfortunately it's the best we can do.
>>
>> As far as I can tell, it is not correct to say that you referenced the
>> previous thread. I do not see any such reference.
>>
>> ...Robert
>>
>>
>>
>> I was asking about modifying the schema.
>>
>> The current schema is an integer being used as a bitmask. If the planner
>> knows how to handle a type of "bit(X)" (and will at least FILTER rather than
>> NESTED LOOP it on a select, as happens for an Integer used in this fashion),
>> that change is easier than splitting it into individual boolean fields.
>>
>
> Well, the first several replies seem to address that question - I
> think we all agree that won't help. I'm not sure what you mean by "at
> least FILTER rather than NESTED LOOP it on a select". However,
> typically, the time when you get a nested loop is when the planner
> believes that the loop will be executed very few times (in other
> words, the outer side will return very few rows). It probably isn't
> the case that the planner COULDN'T choose to execute the query in some
> other way; rather, the planner believes that the nested loop is faster
> because of a (mistaken) belief about how many rows the
> bitmap-criterion will actually match. All the suggestions you've
> gotten upthread are tricks to enable the planner to make a better
> estimate, which will hopefully cause it to choose a better plan.
>
> As a general statement, selectivity estimation problems are very
> painful to work around and often involve substantial application
> redesign. In all honesty, I think you've run across one of the easier
> variants. As painful as it is to hear the word easy applied to a
> problem that's killing you, there actually IS a good solution to this
> problem: use individual boolean fields. I know that's not what you
> want to do, but it's better than "sorry, you're hosed, no matter how
> you do this it ain't gonna work". And I do think there are a few in
> the archives that fall into that category.
>
> Good luck, and sorry for the bad news.
>
> ...Robert
>
The individual boolean fields don't kill me and in terms of some of the
application issues they're actually rather easy to code for.

The problem with re-coding for them is extensibility (by those who
install and administer the package); a mask leaves open lots of extra
bits for "site-specific" use, where hard-coding booleans does not, and
since the executable is a binary it instantly becomes a huge problem for
everyone but me.

It does appear, however, that a bitfield doesn't evaluate any
differently than does an integer used with a mask, so there you have
it..... it is what it is, and if I want this sort of selectivity in the
search I have no choice.

-- Karl

Attachment Content-Type Size
karl.vcf text/x-vcard 124 bytes

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2009-09-08 02:22:47 Re: Planner question - "bit" data types
Previous Message Brian Cox 2009-09-08 01:22:37 Re: Planner question - "bit" data types