Re: Bitmap scan is undercosted?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted?
Date: 2017-12-12 07:29:05
Message-ID: CAMkU=1zVsx-D4NVnpUpWcRehEk8GxA9zdLxe9XWCak_gyvOY4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, Dec 5, 2017 at 11:06 PM, Vitaliy Garnashevich <
vgarnashevich(at)gmail(dot)com> wrote:

This is very cool, thanks.

> I've tried to create a better test case:
> - Increase shared_buffers and effective_cache_size to fit whole database,
> including indexes.
> - Use random(), to avoid correlation between the filtered values.
> - Make both columns of integer type, to avoid special cases with boolean
> (like the one happened with CREATE STATISTICS).
> - Flush OS disk cache and then try running the query several times, to get
> both cold-cache results and all-in-ram results.
> - There are several tests, with different frequency of the selected values
> in the two columns: [1/10, 1/10], [1/50, 1/10], [1/100, 1/10].
> - There is a split of cost by contribution of each of its components:
> seq_page_cost, random_page_cost, cpu_tuple_cost, cpu_index_tuple_cost,
> cpu_operator_cost. The EXPLAIN is run for each component, every time with
> only one of the components set to non-zero.
>

Where you have question marks, that means you could not force it into the
plan you wanted with all-but-one settings being zero?

> - The test was run on a Digitalocean VM: Ubuntu 16.04.3 LTS (GNU/Linux
> 4.4.0-101-generic x86_64), 2 GB RAM, 2 core CPU, SSD; PostgreSQL 9.5.10.
>
>
> shared_buffers = 1024MB
> effective_cache_size = 1024MB
>

I would set this even higher.

>
> work_mem = 100MB
>
> create table aaa as select floor(random()*10)::int num, (random()*10 <
> 1)::int flag from generate_series(1, 10000000) id;
> create table aaa as select floor(random()*50)::int num, (random()*10 <
> 1)::int flag from generate_series(1, 10000000) id;
> create table aaa as select floor(random()*100)::int num, (random()*10 <
> 1)::int flag from generate_series(1, 10000000) id;
>
> create index i1 on aaa (num);
> create index i2 on aaa (flag);
>
> set enable_bitmapscan = on; set enable_indexscan = off; set
> enable_seqscan = off;
> set enable_bitmapscan = off; set enable_indexscan = on; set
> enable_seqscan = off;
> set enable_bitmapscan = off; set enable_indexscan = off; set
> enable_seqscan = on;
>
> set seq_page_cost = 1.0; set random_page_cost = 1.0; set cpu_tuple_cost =
> 0.01; set cpu_index_tuple_cost = 0.005; set cpu_operator_cost = 0.0025;
>
> explain (analyze,verbose,costs,buffers) select * from aaa where num = 1
> and flag = 1;
>

One thing to try is to use explain (analyze, timing off), and then get the
total execution time from the summary line at the end of the explain,
rather than from "actual time" fields. Collecting the times of each
individual step in the execution can impose a lot of overhead, and some
plans have more if this artificial overhead than others. It might change
the results, or it might not. I know that sorts and hash joins are very
sensitive to this, but I don't know about bitmap scans.

....

What seems odd to me is that in different kinds of tests (with different
> frequency of column values):
>
> i1 Rows Removed by Filter = 900156, 179792, 89762 (decreased a lot)
> i1 buffers = 46983, 44373, 39928 (decreased, but not a lot)
>

To filter out 89762 tuples, you first have to look them up in the table,
and since they are randomly scattered that means you hit nearly every page
in the table at least once. In fact, I don't understand how the empirical
number of buffers hits can be only 46983 in the first case, when it has to
visit 1,000,000 rows (and reject 90% of them). I'm guessing that it is
because your newly created index is sorted by ctid order within a given
index value, and that the scan holds a pin on the table page between
visits, and so doesn't count as a hit if it already holds the pin.

You could try to create an empty table, create the indexes, then populate
the table with your random select query, to see if that changes the buffer
hit count. (Note that this wouldn't change the cost estimates much even it
does change the measured number of buffer hits, because of
effective_cache_size. It knows you will be hitting ~47,000 pages ~25 times
each, and so only charges you for the first time each one is hit.)

> i1 best case time = 756.045, 127.814, 79.492 (decreased a lot, as well as
> probably average case too)
> i1 cost estimates = 67358.15, 48809.34, 46544.80 (did not decrease a lot)
>

Right. Your best case times are when the data is completely in cache. But
your cost estimates are dominated by *_page_cost, which are irrelevant when
the data is entirely in cache. You are telling it to estimate the
worse-case costs, and it is doing that pretty well (within this one plan).

>
> i2 Rows Removed by Filter = 900835, 980350, 991389
> i2 buffers = 46985, 46985, 46987
> i2 best case time = 377.554, 346.481, 387.874
> i2 cost estimates = 39166.34, 39247.83, 40167.34
>
> It's odd that increase in actual execution time for "i1" was not reflected
> enough in cost estimates.
>

No, that's entirely expected given your settings. As long as you are
charging disk-read costs for reading data from RAM, you will never get
realistic cost estimates. Remember, you aren't trying to tune your
production server here, you are trying to get a test case that can be
dissected.

Perhaps you think that effective_cache_size is supposed to fix this for
you. But it only accounts for blocks hit repeatedly within the same
query. It has no idea that you are running a bunch of other queries on the
same data back to back, and so it will likely find that data already in
memory from one query to the next. That knowledge (currently) has to be
baked into your *_page_cost setting. There is also no way to say that data
for one table is more likely to be found in cache than data for another
table is.

The cost even didn't go below "i2" costs.
>

That is partially because i2 benefits from a spuriously large correlation
due to an artifact of how stats are computed. See another thread that has
spun off of this one.

If you want to see the effect of this on your cost estimates, you can do
something like:

update pg_statistic set stanumbers2='{0}' where starelid='aaa'::regclass
and staattnum=2;

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-12-12 08:31:26 README.tuplock seems to have outdated information
Previous Message Simon Riggs 2017-12-12 07:25:17 Re: Rethinking MemoryContext creation

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2017-12-12 09:29:48 Re: Bitmap scan is undercosted? - overestimated correlation and cost_index
Previous Message jwhiting 2017-12-11 12:13:23 Re: Prepared Transactions