Re: BRIN index which is much faster never chosen by planner

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Jeremy Finzel <finzelj(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Michael Lewis <mlewis(at)entrata(dot)com>
Subject: Re: BRIN index which is much faster never chosen by planner
Date: 2019-10-14 20:48:14
Message-ID: CAKJS1f8XM4WckdOQ_RDp8MSt2O1BkktW1EGe7MZZ31svr90Obg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 15 Oct 2019 at 08:43, Jeremy Finzel <finzelj(at)gmail(dot)com> wrote:
> I wanted to follow up on this specific issue. Isn't this the heart of the matter and a fundamental problem? If I want to rely on BRIN indexes as in a straightforward case as explained in OP, but I don't know if the planner will be nearly reliable enough, how can I depend on them in production? Is this not considered a planner bug or should this kind of case be documented as problematic for BRIN? As another way to look at it: is there a configuration parameter that could be added specific to BRIN or bitmapscan to provide help to cases like this?
>
> On freshly analyzed tables, I tried my original query again and found that even with now() - 3 days it does not choose the BRIN index. In fact, it chose another btree on the table like (id1, id2, rec_insert_time). With warm cache, the pg-chosen plan takes 40 seconds to execute, whereas when I force a BRIN scan it takes only 4 seconds.

Another thing which you might want to look at is the correlation
column in the pg_stats view for the rec_insert_time column. Previous
to 7e534adcd, BRIN index were costed based on the selectivity
estimate. There was no accountability towards the fact that the pages
for those records might have been spread out over the entire table.
Post 7e534adcd, we use the correlation estimate to attempt to estimate
how many pages (more specifically "ranges") we're likely to hit based
on that and the selectivity estimate. This commit intended to fix the
issue we had with BRIN indexes being selected far too often. Of
course, the correlation is based on the entire table, if there are
subsets of the table that are perhaps perfectly correlated, then the
planner is not going to know about that. It's possible that some of
your older rec_insert_times are spread out far more than the newer
ones. As a test, you could try creating a new table and copying the
records over to it in rec_insert_time order and seeing if the BRIN
index is selected for that table (after having performed an ANALYZE).

It would be interesting if you could show the pg_stats row for the
column so that we can see if the correlation is low.

You can see from the code below that the final selectivity strongly
influenced by the correlation value (REF: brincostestimate)

qualSelectivity = clauselist_selectivity(root, indexQuals,
baserel->relid,
JOIN_INNER, NULL);

/* work out the actual number of ranges in the index */
indexRanges = Max(ceil((double) baserel->pages / statsData.pagesPerRange),
1.0);

/*
* Now calculate the minimum possible ranges we could match with if all of
* the rows were in the perfect order in the table's heap.
*/
minimalRanges = ceil(indexRanges * qualSelectivity);

/*
* Now estimate the number of ranges that we'll touch by using the
* indexCorrelation from the stats. Careful not to divide by zero (note
* we're using the absolute value of the correlation).
*/
if (*indexCorrelation < 1.0e-10)
estimatedRanges = indexRanges;
else
estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);

/* we expect to visit this portion of the table */
selec = estimatedRanges / indexRanges;

CLAMP_PROBABILITY(selec);

My overall view on this is that the BRIN index is not that great since
it's not eliminating that many rows by using it.

From above we see:

> Bitmap Heap Scan on foo.log_table l (cost=2391.71..24360848.29 rows=735 width=99) (actual time=824.133..21329.054 rows=466 loops=1)
Output: <hidden>
Recheck Cond:
(l.rec_insert_time >= (now() - '10 days'::interval))
Rows Removed by Index Recheck: 8187584
Filter: ((l.field1 IS NOT NULL)
AND (l.category = 'music'::name))
Rows Removed by Filter: 19857107
Heap Blocks: lossy=1509000

So you have just 466 rows matching these quals, but the executor had
to scan 1.5 million pages to get those and filter out 8.1 million rows
on the recheck then 19.8 million on the filter. You've mentioned that
the table's heap is 139 GB, which is about 18 million pages. It seems
your query would perform much better if you had a btree index such as
(category, rec_insert_time) where field1 is not null;,

Of course, you've mentioned that you are finding when the plan uses
the BRIN index that it executes more quickly, but I think you're going
to find BRIN unreliable for tables anything other than INSERT-only
tables which the records are always inserted with an ever-increasing
or decreasing value in the BRIN indexed column. If you start
performing UPDATEs then that's going to create holes that new record
will fill and cause the correlation to start dropping resulting in the
BRIN indexes scan cost going up.

On the other hand, if you think you can do better than what was done
in 7e534adcd, then it would be good to see someone working on it. I'm
sure something better can be done. It's just not that easy to do with
the scant correlation data we have on the column.

As for is this a bug or something that's missing from the documents.
The documents do mention:

"BRIN stands for Block Range Index. BRIN is designed for handling very
large tables in which certain columns have some natural correlation
with their physical location within the table."

https://www.postgresql.org/docs/current/brin-intro.html

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2019-10-14 22:20:17 Re: Columns correlation and adaptive query optimization
Previous Message Jeremy Finzel 2019-10-14 19:42:51 Re: BRIN index which is much faster never chosen by planner