Re: On partitioning

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: José Luis Tallón <jltallon(at)adv-solutions(dot)net>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-12-15 11:59:08
Message-ID: CAGTBQpZ4oXjDc1wEy2kXu8otBUyi7qvWCGPQUC=3783wtiN_kg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 15, 2014 at 8:09 AM, José Luis Tallón
<jltallon(at)adv-solutions(dot)net> wrote:
> On 12/15/2014 07:42 AM, Claudio Freire wrote:
>>
>> [snip]
>
>
>> If you do that, you start with empty partitions, and each insert updates
>> the BRIN tuple. Avoiding concurrency loss in this case would be tricky, but
>> in theory this could allow very general partition exclusion. In fact it
>> could even work with constraint exclusion right now: you'd have a
>> single-tuple BRIN index for each partition and benefit from it. But you
>> don't need to pay the price of updating BRIN indexes, as min-max tuples for
>> each partition can be produced while creating the partitions if the syntax
>> already provides the information. Then, it's just a matter of querying this
>> meta-data which just happens to have the form of a BRIN tuple for each
>> partition.
>
>
> Yup. Indeed this is the way I outlined in my previous e-mail.
>
> The only point being: Why bother with BRIN when we already have the range
> machinery, and it's trivial to add pointers to partitions from each range?

The part of BRIN I find useful is not its on-disk structure, but all
the execution machinery that checks quals against BRIN tuples. It's
not a trivial part of code, and is especially useful since it's
generalizable. New BRIN operator classes can be created and that's an
interesting power to have in partitioning as well.

Casting from ranges into min-max BRIN tuples seems quite doable, so
both range and list notation should work fine. But BRIN works also for
the generic "routing expression" some people seem to really want, and
dynamically updated BRIN meta-indexes seem to be the only efficient
solution for that.

BRIN lacks some features, as you noted, so it does need some love
before it's usable for this. But they're features BRIN itself would
find useful so you take out two ducks in one shot.

> I suggested that BRIN would solve a situation when the amount of partitions
> is huge (say, thousands) and we might need to be able to efficiently locate
> the appropriate partition. In this situation, a linear search might become
> prohibitive, or the data structure (a simple B-Tree, maybe) become too big
> to be worth keeping in memory. This is where being able to store the
> "partition index" on disk would be interesting.

BRIN also does a linear search, so it doesn't solve that. BRIN's only
power is that it can answer very fast whether some quals rule out a
partition.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-15 13:08:28 Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Previous Message Alvaro Herrera 2014-12-15 11:42:10 Re: replicating DROP commands across servers