Skip site navigation (1) Skip section navigation (2)

Re: number of rows estimation for bit-AND operation

From: Slava Moudry <smoudry(at)4info(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Scott Marlowe<scott(dot)marlowe(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: number of rows estimation for bit-AND operation
Date: 2009-08-20 22:59:43
Message-ID: 622F69662CFE9F4182958973F99F3F151515E74E32@EXVMBX017-12.exch017.msoutlookonline.net (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,
Yes, I thought about putting the bit-flags in separate fields.
Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of records in fact table, so I prefer to pack them into one int8.
For users it's also much easier to write "where mt_flags&134=0" instead of "where f_2=false and f4=false and f_128=false".
In Teradata - that worked just fine, but it costs millions vs. zero cost for Postgres, so I am not really complaining out loud :)

Hopefully Tom or other bright folks at PG could take a look at this for the next patch/release.
Btw, can you send me the link to " PG's selectivity estimator" discussion - I'd like to provide feedback if I can.
Thanks,
-Slava.


-----Original Message-----
From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com] 
Sent: Thursday, August 20, 2009 10:55 AM
To: Scott Marlowe
Cc: Slava Moudry; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowe<scott(dot)marlowe(at)gmail(dot)com> wrote:
> 2009/8/18 Slava Moudry <smoudry(at)4info(dot)net>:
>>> increase default stats target, analyze, try again.
>> This field has only 5 values. I had put values/frequencies in my first post.
>
> Sorry, kinda missed that.  Anyway, there's no way for pg to know which
> operation is gonna match.  Without an index on it.  So my guess is
> that it just guesses some fixed value.  With an index it might be able
> to get it right, but you'll need an index for each type of match
> you're looking for.  I think.  Maybe someone else on the list has a
> better idea.

The best way to handle this is probably to not cram multiple vales
into a single field.  Just use one boolean for each flag.  It won't
even cost you any space, because right now you are using 8 bytes to
store 5 booleans, and 5 booleans will (I believe) only require 5
bytes.  Even if you were using enough of the bits for the space usage
to be higher with individual booleans, the overall performance is
likely to be better that way.

This is sort of stating the obvious, but it doesn't make it any less
true.  Unfortunately, PG's selectivity estimator can't handle cases
like this.  Tom Lane recently made some noises about trying to improve
it, but it's not clear whether that will go anywhere, and in any event
it won't happen before 8.5.0 comes out next spring/summer.

...Robert

In response to

Responses

pgsql-performance by date

Next:From: Kevin KempterDate: 2009-08-20 23:09:25
Subject: improving my query plan
Previous:From: Craig JamesDate: 2009-08-20 22:18:59
Subject: Re: Number of tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group