Re: number of rows estimation for bit-AND operation

From: Slava Moudry <smoudry(at)4info(dot)net>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: number of rows estimation for bit-AND operation
Date: 2009-08-18 22:11:20
Message-ID: 622F69662CFE9F4182958973F99F3F15151033CBD0@EXVMBX017-12.exch017.msoutlookonline.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> increase default stats target, analyze, try again.
This field has only 5 values. I had put values/frequencies in my first post.
Based on the values (see below) - there is no reason for planner to think that mt_flags&134=0 should return 12200 rows.
select mt_flags, count(*) from staging.tmp_t group by 1;
mt_flags | count
----------+---------
128 | 57362
4 | 1371
8 | 627
2 | 19072
0 | 2361630
(5 rows)

In fact, if I rewrite the query using value matching - the estimations are right on:
explain analyze select count(*) from staging.tmp_t where mt_flags not in (128,2,4);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=85878.63..85878.64 rows=1 width=0) (actual time=2904.005..2904.005 rows=1 loops=1)
-> Seq Scan on tmp_t (cost=0.00..79973.85 rows=**2361910** width=0) (actual time=0.008..2263.983 rows=2362257 loops=1)
Filter: (mt_flags <> ALL ('{128,2,4}'::integer[]))
Total runtime: 2904.038 ms
(4 rows)

Anyways, I've been using statistics target of 100 in 8.3 and in 8.4 100 is default. I am currently using default_statistics_target=1000.

Do you think that bit-and function might be skewing the statistics for execution plan somehow?
Thanks,
-Slava.

-----Original Message-----
From: Scott Marlowe [mailto:scott(dot)marlowe(at)gmail(dot)com]
Sent: Tuesday, August 18, 2009 2:58 PM
To: Slava Moudry
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

2009/8/18 Slava Moudry <smoudry(at)4info(dot)net>:
> Hi Scott,
> Thank you for reply.
> I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your advice:

Yeah, you're returning most of the rows, so a seq scan makes sense.
Try indexing / matching on something more uncommon and you should get
an index scan.

> The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition.
> It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below it is again 200x off on number of rows.

increase default stats target, analyze, try again.

> explain analyze select count(*) from staging.tmp_t where  mt_flags&134=0;
>                                                      QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1)
>   ->  Seq Scan on tmp_t  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1)
>         Filter: ((mt_flags & 134) = 0)
>  Total runtime: 2965.009 ms
> (4 rows)
>
> I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x  (now using 8.4.0).
> We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing this in reporting applications.

Looks more like a low stats target. Try increasing that first.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2009-08-18 22:34:20 Re: number of rows estimation for bit-AND operation
Previous Message Kevin Grittner 2009-08-18 21:59:22 Re: SQL Query Performance - what gives?