Re: Bitmap scan is undercosted?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted?
Date: 2017-12-03 23:08:46
Message-ID: 11630.1512342526@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I wrote:
> I tried creating multiple-column statistics using the v10 facility for
> that:
> regression=# create statistics s1 on num, flag from aaa;
> CREATE STATISTICS
> regression=# analyze aaa;
> ANALYZE
> but that changed the estimate not at all, which surprised me because
> dependency statistics are supposed to fix exactly this type of problem.
> I suspect there may be something in the extended-stats code that causes it
> not to work right for boolean columns --- this wouldn't be excessively
> surprising because of the way the planner messes around with converting
> "flag = true" to just "flag" and sometimes back again. But I've not
> looked closer yet.

After looking, I found that indeed dependency_is_compatible_clause()
rejects expressions like "flag" or "NOT flag", which it needn't since
those are fully equivalent to "flag = true" or "flag = false"
respectively. Moreover there's nothing else in
dependencies_clauselist_selectivity that depends on the exact form of
the clause under test, only on the semantics that it's an equality
condition on some Var. Hence I propose the attached patch, which
fixes the rowcount estimate for the example discussed in this thread:

create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 10000000) id;
create index i1 on aaa (num);
create index i2 on aaa (flag);
create statistics s1 on num, flag from aaa;
analyze aaa;

explain analyze select count(*) from aaa where num = 1 and flag = true;

Without patch:

Aggregate (cost=43236.73..43236.74 rows=1 width=8) (actual time=349.365..349.3
65 rows=1 loops=1)
-> Bitmap Heap Scan on aaa (cost=20086.40..43212.94 rows=9514 width=0) (act
ual time=101.308..337.985 rows=100000 loops=1)
Recheck Cond: (num = 1)
Filter: flag
Heap Blocks: exact=44248
-> BitmapAnd (cost=20086.40..20086.40 rows=9514 width=0) (actual time
=92.214..92.214 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1776.43 rows=96000 width
=0) (actual time=17.236..17.236 rows=100000 loops=1)
Index Cond: (num = 1)
-> Bitmap Index Scan on i2 (cost=0.00..18304.96 rows=991003 wid
th=0) (actual time=72.168..72.168 rows=1000000 loops=1)
Index Cond: (flag = true)
Planning time: 0.254 ms
Execution time: 350.796 ms

With patch:

Aggregate (cost=43496.19..43496.20 rows=1 width=8) (actual time=359.195..359.1
95 rows=1 loops=1)
-> Bitmap Heap Scan on aaa (cost=20129.64..43256.19 rows=96000 width=0) (ac
tual time=99.750..347.353 rows=100000 loops=1)
Recheck Cond: (num = 1)
Filter: flag
Heap Blocks: exact=44248
-> BitmapAnd (cost=20129.64..20129.64 rows=9514 width=0) (actual time
=90.671..90.671 rows=0 loops=1)
-> Bitmap Index Scan on i1 (cost=0.00..1776.43 rows=96000 width
=0) (actual time=16.946..16.946 rows=100000 loops=1)
Index Cond: (num = 1)
-> Bitmap Index Scan on i2 (cost=0.00..18304.96 rows=991003 wid
th=0) (actual time=70.898..70.898 rows=1000000 loops=1)
Index Cond: (flag = true)
Planning time: 0.218 ms
Execution time: 360.608 ms

That's the right overall rowcount estimate for the scan, given the stats
it's working from. There's apparently still something wrong with bitmap
costing, since it's still estimating this as cheaper than the single-index
case --- noting the bogus rowcount estimate for the BitmapAnd, I suspect
that bitmap costing is taking shortcuts rather than using
clauselist_selectivity to estimate the overall selectivity of the bitmap
conditions. But whatever is causing that, it's independent of this
deficiency.

In addition to the bugfix proper, I improved some comments, got rid of
a NumRelids() test that's redundant with the preceding bms_membership()
test, and fixed dependencies_clauselist_selectivity so that
estimatedclauses actually is a pure output argument as stated by its
API contract.

regards, tom lane

Attachment Content-Type Size
fix-dependencies.c-for-boolean-quals.patch text/x-diff 7.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-12-03 23:15:15 Re: [HACKERS] Constraint exclusion for partitioned tables
Previous Message Michael Paquier 2017-12-03 22:48:29 Re: pg_dumpall -r -c try to drop user postgres

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2017-12-03 23:21:56 Re: Bitmap scan is undercosted? - boolean correlation
Previous Message Vitaliy Garnashevich 2017-12-03 22:11:47 Re: Bitmap scan is undercosted?