Re: number of rows estimation for bit-AND operation

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Slava Moudry <smoudry(at)4info(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: number of rows estimation for bit-AND operation
Date: 2009-08-18 07:08:46
Message-ID: dcc563d10908180008g6e53e53do7aa16c928fa996b3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Aug 17, 2009 at 2:07 PM, Slava Moudry<smoudry(at)4info(dot)net> wrote:
> Hi,
>
> I am using int8 field to pack a number of error flags. This is very common
> technique for large tables to pack multiple flags in one integer field.
>
> For most records – the mt_flags field is 0. Here is the statistics (taken
> from pgAdmin Statistics tab for mt_flags column):
>
> Most common Values: {0,128,2,4,8)
>
> Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)
>
> What I notice that when bit-AND function is used – Postgres significantly
> underestimates the amount of rows:
>
> explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;
>
>                               QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
>
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual
> time=2883.154..2883.154 rows=1 loops=1)
>
>    ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0)
> (actual time=0.008..2100.390 rows=2439435 loops=1)
>
>          Filter: ((mt_flags & 8) = 0)
>
>  Total runtime: 2883.191 ms
>
> (4 rows)
>
> This is not an issue for the particular query above, but I noticed that due
> to that miscalculation in many cases Postgres chooses plan with Nested Loops
> for other queries. I can fix it by setting enable_nest_loops to off, but
> it's not something I should set for all queries.
>
> Is there any way to help Postgres make a better estimation for number of
> rows returned by bit function?

You can index on the function. For instance:

create table t (mt_flags int);
create index t_mtflags_bit on t ((mt_flags&8));
insert into t select case when random() > 0.95 then case when random()
>0.5 then 8 else 12 end else 0 end from generate_series(1,10000);
analyze t;
explain select * from t where mt_flags&8=8;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using t_mtflags_bit on t (cost=0.00..52.17 rows=467 width=4)
Index Cond: ((mt_flags & 8) = 8)
(2 rows)

Hope that helps a little.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre Frédéric Caillaud 2009-08-18 09:38:35 Re: Getting time of a postgresql-request
Previous Message Greg Smith 2009-08-18 04:33:01 Re: Memory reporting on CentOS Linux