WIP: BRIN multi-range indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: WIP: BRIN multi-range indexes
Date: 2017-10-25 16:22:54
Message-ID: c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

A couple of days ago I've shared a WIP patch [1] implementing BRIN
indexes based on bloom filters. One inherent limitation of that approach
is that it can only support equality conditions - that's perfectly fine
in many cases (e.g. with UUIDs it's rare to use range queries, etc.).

[1]
https://www.postgresql.org/message-id/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com

But in other cases that restriction is pretty unacceptable, e.g. with
timestamps that are queried mostly using range conditions. A common
issue is that while the data is initially well correlated (giving us
nice narrow min/max ranges in the BRIN index), this degrades over time
(typically due to DELETE/UPDATE and then new rows routed to free space).
There are not many options to prevent this, and fixing it pretty much
requires CLUSTER on the table.

This patch addresses this by BRIN indexes with more complex "summary".
Instead of keeping just a single "minmax interval", we maintain a list
of "minmax intervals", which allows us to track "gaps" in the data.

To illustrate the improvement, consider this table:

create table a (val float8) with (fillfactor = 90);
insert into a select i::float from generate_series(1,10000000) s(i);
update a set val = 1 where random() < 0.01;
update a set val = 10000000 where random() < 0.01;

Which means the column 'val' is almost perfectly correlated with the
position in the table (which would be great for BRIN minmax indexes),
but then 1% of the values is set to 1 and 10.000.000. That means pretty
much every range will be [1,10000000], which makes this BRIN index
mostly useless, as illustrated by these explain plans:

create index on a using brin (val) with (pages_per_range = 16);

explain analyze select * from a where val = 100;
QUERY PLAN
--------------------------------------------------------------------
Bitmap Heap Scan on a (cost=54.01..10691.02 rows=8 width=8)
(actual time=5.901..785.520 rows=1 loops=1)
Recheck Cond: (val = '100'::double precision)
Rows Removed by Index Recheck: 9999999
Heap Blocks: lossy=49020
-> Bitmap Index Scan on a_val_idx
(cost=0.00..54.00 rows=3400 width=0)
(actual time=5.792..5.792 rows=490240 loops=1)
Index Cond: (val = '100'::double precision)
Planning time: 0.119 ms
Execution time: 785.583 ms
(8 rows)

explain analyze select * from a where val between 100 and 10000;
QUERY PLAN
------------------------------------------------------------------
Bitmap Heap Scan on a (cost=55.94..25132.00 rows=7728 width=8)
(actual time=5.939..858.125 rows=9695 loops=1)
Recheck Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Rows Removed by Index Recheck: 9990305
Heap Blocks: lossy=49020
-> Bitmap Index Scan on a_val_idx
(cost=0.00..54.01 rows=10200 width=0)
(actual time=5.831..5.831 rows=490240 loops=1)
Index Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Planning time: 0.139 ms
Execution time: 871.132 ms
(8 rows)

Obviously, the queries do scan the whole table and then eliminate most
of the rows in "Index Recheck". Decreasing pages_per_range does not
really make a measurable difference in this case - it eliminates maybe
10% of the rechecks, but most pages still have very wide minmax range.

With the patch, it looks about like this:

create index on a using brin (val float8_minmax_multi_ops)
with (pages_per_range = 16);

explain analyze select * from a where val = 100;
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on a (cost=830.01..11467.02 rows=8 width=8)
(actual time=7.772..8.533 rows=1 loops=1)
Recheck Cond: (val = '100'::double precision)
Rows Removed by Index Recheck: 3263
Heap Blocks: lossy=16
-> Bitmap Index Scan on a_val_idx
(cost=0.00..830.00 rows=3400 width=0)
(actual time=7.729..7.729 rows=160 loops=1)
Index Cond: (val = '100'::double precision)
Planning time: 0.124 ms
Execution time: 8.580 ms
(8 rows)

explain analyze select * from a where val between 100 and 10000;
QUERY PLAN
------------------------------------------------------------------
Bitmap Heap Scan on a (cost=831.94..25908.00 rows=7728 width=8)
(actual time=9.318..23.715 rows=9695 loops=1)
Recheck Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Rows Removed by Index Recheck: 3361
Heap Blocks: lossy=64
-> Bitmap Index Scan on a_val_idx
(cost=0.00..830.01 rows=10200 width=0)
(actual time=9.274..9.274 rows=640 loops=1)
Index Cond: ((val >= '100'::double precision) AND
(val <= '10000'::double precision))
Planning time: 0.138 ms
Execution time: 36.100 ms
(8 rows)

That is, the timings drop from 785ms/871ms to 9ms/36s. The index is a
bit larger (1700kB instead of 150kB), but it's still orders of
magnitudes smaller than btree index (which is ~214MB in this case).

The index build is slower than the regular BRIN indexes (about
comparable to btree), but I'm sure it can be significantly improved.
Also, I'm sure it's not bug-free.

Two additional notes:

1) The patch does break the current BRIN indexes, because it requires
passing all SearchKeys to the "consistent" BRIN function at once
(otherwise we couldn't eliminate individual intervals in the summary),
while currently the BRIN only deals with one SearchKey at a time. And I
haven't modified the existing brin_minmax_consistent() function (yeah,
I'm lazy, but this should be enough for interested people to try it out
I believe).

2) It only works with float8 (and also timestamp data types) for now,
but it should be straightforward to add support for additional data
types. Most of that will be about adding catalog definitions anyway.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
brin-multi-range-v1.patch text/x-patch 19.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-10-25 16:51:44 Re: WIP: BRIN multi-range indexes
Previous Message Michael Paquier 2017-10-25 15:41:09 Re: CurTransactionContext freed before transaction COMMIT ???