Skip site navigation (1) Skip section navigation (2)

Re: number of rows estimation for bit-AND operation

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Slava Moudry <smoudry(at)4info(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: number of rows estimation for bit-AND operation
Date: 2009-08-18 07:08:46
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
On Mon, Aug 17, 2009 at 2:07 PM, Slava Moudry<smoudry(at)4info(dot)net> wrote:
> Hi,
> I am using int8 field to pack a number of error flags. This is very common
> technique for large tables to pack multiple flags in one integer field.
> For most records – the mt_flags field is 0. Here is the statistics (taken
> from pgAdmin Statistics tab for mt_flags column):
> Most common Values: {0,128,2,4,8)
> Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)
> What I notice that when bit-AND function is used – Postgres significantly
> underestimates the amount of rows:
> explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;
>                               QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual
> time=2883.154..2883.154 rows=1 loops=1)
>    ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0)
> (actual time=0.008..2100.390 rows=2439435 loops=1)
>          Filter: ((mt_flags & 8) = 0)
>  Total runtime: 2883.191 ms
> (4 rows)
> This is not an issue for the particular query above, but I noticed that due
> to that miscalculation in many cases Postgres chooses plan with Nested Loops
> for other queries. I can fix it by setting enable_nest_loops to off, but
> it's not something I should set for all queries.
> Is there any way to help Postgres make a better estimation for number of
> rows returned by bit function?

You can index on the function.  For instance:

create table t (mt_flags int);
create index t_mtflags_bit on t ((mt_flags&8));
insert into t select case when random() > 0.95 then case when random()
>0.5 then 8 else 12 end else 0 end from generate_series(1,10000);
analyze t;
explain select * from t where mt_flags&8=8;
                                QUERY PLAN
 Index Scan using t_mtflags_bit on t  (cost=0.00..52.17 rows=467 width=4)
   Index Cond: ((mt_flags & 8) = 8)
(2 rows)

Hope that helps a little.

In response to


pgsql-performance by date

Next:From: Pierre Frédéric CaillaudDate: 2009-08-18 09:38:35
Subject: Re: Getting time of a postgresql-request
Previous:From: Greg SmithDate: 2009-08-18 04:33:01
Subject: Re: Memory reporting on CentOS Linux

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group