CLUSTER on brin indexes

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: CLUSTER on brin indexes
Date: 2015-05-14 06:39:51
Message-ID: CAEepm=2LUCLZ2J4cwPv5DisHqD9BE_AXnqHGqf0Tj-cvtiqVcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

Here's an idea for a small 9.6 feature that I don't see on the todo
list already. I noticed that you can't currently CLUSTER on a brin
index. As I understand it, brin indexes are most effective when
blocks have non-overlapping value ranges, and less effective the more
the ranges overlap, so it seems reasonable (essential?) to want to do
this, and the machinery to do it is nearly all there already. Or am I
missing some fundamental reason why this can't work?

Poking around in the code, I see that you can only cluster on index
types that have amclusterable set, which currently means btree and
gist. I wonder if this could be split up into 3 separate concepts:

1. This index type supports CLUSTER (without saying how). True for
btree and gist, ought to be true for brin. That's amclusterable.

2. This index type can do ordered index scans (and therefore scanning
is an option when you run CLUSTER). True for btree and gist, false
for brin. I think this probably amounts to amcanorder ||
amcanorderbyop.

3. This index type can describe the best clustering order in terms
that tuplesort can handle. True for btree and false for gist
(effectively, currently this is hardcoded), and ought to be true for
brin. Maybe there could be a new attribute for this,
amhasclusterorderby (insert better name here), and a new operation
amclusterorderby (insert better name here) which could give
tuplesort_begin_cluster the information that it currently extracts
from btree indexes via a non-generic interface.

Another approach could be to allow user-defined orderings, something
like CLUSTER my_table ORDER BY ..., but I don't know if you'd ever
want to do clustering that wasn't based on information that can be
pulled out of an index. I thought about that first when trying to
figure out how to use brin indexes effectively, before I realised that
I really just wanted CLUSTER to understand brin indexes.

Also, it occurred to me that if you have CLUSTERed a table by a brin
index and then nobody has touched any tuples in the blocks you are
interested in since then and somehow you know these facts, then you
know something potentially useful about the physical order of the
tuples within those blocks. (It may be extremely difficult to make
use of that information, I have no idea, so maybe this is crazy talk.)

--
Thomas Munro
http://www.enterprisedb.com

Browse pgsql-hackers by date

  From Date Subject
Next Message Seçkin Alan 2015-05-14 06:44:32 Re: pgAdmin4 Bug fix or my Fault ?
Previous Message Andres Freund 2015-05-14 06:38:07 Re: Final Patch for GROUPING SETS