Skip site navigation (1) Skip section navigation (2)

number of rows estimation for bit-AND operation

From: Slava Moudry <smoudry(at)4info(dot)net>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: number of rows estimation for bit-AND operation
Date: 2009-08-17 20:07:18
Message-ID: (view raw or flat)
Lists: pgsql-performance
I am using int8 field to pack a number of error flags. This is very common technique for large tables to pack multiple flags in one integer field.

For most records - the mt_flags field is 0. Here is the statistics (taken from pgAdmin Statistics tab for mt_flags column):
Most common Values: {0,128,2,4,8)
Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)

What I notice that when bit-AND function is used - Postgres significantly underestimates the amount of rows:

explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;
                                                         QUERY PLAN
 Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2883.154..2883.154 rows=1 loops=1)
   ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.008..2100.390 rows=2439435 loops=1)
         Filter: ((mt_flags & 8) = 0)
 Total runtime: 2883.191 ms
(4 rows)

This is not an issue for the particular query above, but I noticed that due to that miscalculation in many cases Postgres chooses plan with Nested Loops for other queries. I can fix it by setting enable_nest_loops to off, but it's not something I should set for all queries.
Is there any way to help Postgres make a better estimation for number of rows returned by bit function?
-Slava Moudry, Senior DW Engineer. 4Info Inc.

P.S. table definition:

\d mt__20090801
                      Table "dw.mt__20090801"
          Column          |            Type             | Modifiers
 mt_id                    | bigint                      | not null
 mt_ts                    | timestamp without time zone |
 ad_cost                  | numeric(10,5)               |
 short_code               | integer                     |
 message_id               | bigint                      | not null
 mp_code                  | character(1)                | not null
 al_id                    | integer                     | not null
 cust_id                  | integer                     |
 device_id                | integer                     | not null
 broker_id                | smallint                    |
 partner_id               | integer                     |
 ad_id                    | integer                     |
 keyword_id               | integer                     |
 sc_id                    | integer                     |
 cp_id                    | integer                     |
 src_alertlog_id          | bigint                      |
 src_query_id             | bigint                      |
 src_response_message_num | smallint                    |
 src_gateway_message_id   | bigint                      |
 mt_flags                 | integer                     |
 message_length           | integer                     | not null
 created_etl              | timestamp without time zone |
    "mt_device_id__20090801" btree (device_id) WITH (fillfactor=100), tablespace "index2"
    "mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER, tablespace "index2"
Check constraints:
    "mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01 00:00:00'::timestamp without time zone AND mt_ts < '2009-08-02 00:00:00'::timestamp without time
Inherits: mt
Tablespace: "dw_tables3"


pgsql-performance by date

Next:From: Scott CareyDate: 2009-08-17 23:43:16
Subject: Re: Memory reporting on CentOS Linux
Previous:From: Jeremy CarrollDate: 2009-08-17 17:24:36
Subject: Re: Memory reporting on CentOS Linux

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group