Re: Minmax indexes

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-08-05 23:41:31
Message-ID: 20140805234131.GD9388@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for all the feedback on version 9. Here's version 13. (The
intermediate versions are just tags in my private tree which I created
each time I rebased. Please bear with me here.)

I have chosen to keep the name "minmax", even if the opclasses now let
one implement completely different things on top of it such as geometry
bounding boxes and bloom filters (aka bitmap indexes). I don't see a
need for a rename: essentially, in PR we can just say "we have these
neat minmax indexes that other databases also have, but instead of just
being used for integer data, they can also be used for geometry, GIS and
bitmap indexes, so as always we're more powerful than everyone else when
implementing new database features".

This new version includes some changes per feedback. Most notoriously,
the opclass definition is different now: instead of relying on the
"sortable" opclass implementation extracting the oprcode for each
operator strategy (i.e. the functions that underlie < <= >= >), I chose
to have catalog entries in pg_amproc for the underlying support
functions. The new definition makes a lot of sense to me now, after
thinking long about this stuff and carefully reading the
"Catalog Entries for Indexes" chapter in docs.

The way it works now is that there are five pg_amop entries in an
opclass, just like previously (corresponding to the underlying < <= = >= >
operators). This lets the optimizer choose the index when a query uses
those operators. There are also seven pg_amproc entries. The first
three are identical to all minmax opclasses: "opcinfo" (version 9 called
it "getopers"), "consistent" (v9 name "compare") and "add_value" (v9
name "maybeUpdateValues", not a loved name evidently). A minmax opclass
on top of a sortable datatype has four additional support functions: one
for each function underlying the < <= >= > operators. Other opclasses
would define their own support functions here, which would correspond to
functions used to implement the "consistent" and "compare" functions
internally.

I don't claim this is 100% correct, but in particular I think it's now
possible to implement cross-datatype comparisons, so that a minmax index
defined on an int8 column works when the query uses an int4 operator,
for example. (The current patch doesn't actually add such catalog
entries, though. I think some minor code changes are required for this
to actually work. However with the previous opclass definition it would
have been outright impossible.)

I fixed the bug reported by Masao-kun that collatable datatypes weren't
cleanly supported. Collation OIDs are passed down now, although I don't
claim that it is bulletproof. This could use some more testing.

I haven't yet updated the revmap definition per Heikki's review. I am
not sure I want to do that right away. I think we could live with what
we have now, and see about changing this later on in the 9.5 cycle if we
think a different definition is better. I think what we have is pretty
solid even if there are some theoretical holes.

As a very quick test, I created a 10 million tuples table with an int4
column on my laptop. The table is ~346 MB. Creating a btree index on
it takes 8 seconds. A minmax index takes 1.6 seconds. The btree index
is 214 MB. The minmax index, with pages_per_range=1 is 1 MB. With
pages_per_range=16 (default) it is 48kB.

Very unscientific results follow. This is the btree doing an index-only
scan:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Only Scan using bti2 on t (cost=0.43..1692.75 rows=54416 width=4) (actual time=0.106..23.329 rows=54518 loops=1)
Index Cond: ((a > 991243) AND (a < 1045762))
Heap Fetches: 0
Buffers: shared hit=1 read=152
Planning time: 0.695 ms
Execution time: 31.565 ms
(6 filas)

Duración: 33,662 ms

Turn off index-only scan, do a regular index scan:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Scan using bti2 on t (cost=0.43..1932.75 rows=54416 width=4) (actual time=0.066..31.027 rows=54518 loops=1)
Index Cond: ((a > 991243) AND (a < 1045762))
Buffers: shared hit=394
Planning time: 0.250 ms
Execution time: 39.218 ms
(5 filas)

Duración: 40,385 ms

Use the 16-pages-per-range minmax index:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=16.60..47402.01 rows=54416 width=4) (actual time=4.266..43.948 rows=54518 loops=1)
Recheck Cond: ((a > 991243) AND (a < 1045762))
Rows Removed by Index Recheck: 32266
Heap Blocks: lossy=384
Buffers: shared hit=244 read=142
-> Bitmap Index Scan on ti2 (cost=0.00..3.00 rows=54416 width=0) (actual time=1.061..1.061 rows=3840 loops=1)
Index Cond: ((a > 991243) AND (a < 1045762))
Buffers: shared hit=2
Planning time: 0.215 ms
Execution time: 51.820 ms
(10 filas)

This is the 1-page-per-range minmax index:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=157.60..47543.01 rows=54416 width=4) (actual time=82.479..98.642 rows=54518 loops=1)
Recheck Cond: ((a > 991243) AND (a < 1045762))
Rows Removed by Index Recheck: 174
Heap Blocks: lossy=242
Buffers: shared hit=385
-> Bitmap Index Scan on ti (cost=0.00..144.00 rows=54416 width=0) (actual time=82.448..82.448 rows=2420 loops=1)
Index Cond: ((a > 991243) AND (a < 1045762))
Buffers: shared hit=143
Planning time: 0.280 ms
Execution time: 103.542 ms
(10 filas)

Duración: 104,952 ms

This is a seqscan. Notice the high number of buffer accesses:

alvherre=# explain (analyze, buffers) select * from t where a > 991243 and a < 1045762;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..194248.00 rows=54416 width=4) (actual time=161.338..1201.535 rows=54518 loops=1)
Filter: ((a > 991243) AND (a < 1045762))
Rows Removed by Filter: 9945482
Buffers: shared hit=10672 read=33576
Planning time: 0.189 ms
Execution time: 1204.501 ms
(6 filas)

Duración: 1205,304 ms

Of course, this isn't nearly a worst-case scenario for minmax, as the
data is perfectly correlated. The pages_per_range=16 index benefits
particularly from that.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
minmax-13.patch text/x-diff 176.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-08-05 23:55:40 Re: Minmax indexes
Previous Message David G Johnston 2014-08-05 23:34:36 Re: Append to a GUC parameter ?