Re: Auto Partitioning

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Auto Partitioning
Date: 2007-04-06 07:11:18
Message-ID: 1175843479.3623.463.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Fri, 2007-04-06 at 01:56 -0400, Tom Lane wrote:
> Markus Schiltknecht <markus(at)bluegap(dot)ch> writes:
> > Uh.. can you elaborate on that? AFAICS, you would simply have to query
> > multiple btree indexes and make sure non of them is violated.
>
> That only works for the partition-key indexes, ie, ones where you can be
> sure a-priori that there cannot be duplicate keys in two different indexes.
> I believe the complaint here is that people would like to be able to
> enforce uniqueness across the whole partitioned table on columns that
> are not part of the partition key.

I see that as a logical modelling problem, not a physical one.

If you partition on invoice_date, but have PK=invoice_id then it seems
straightforward to change the model so that the PK is a compound key
(invoice_id, invoice_period). This works whether or nor invoice_id is
unique on its own. And this is typically the way things are modelled in
the real world anyway, since such things existed from the time of paper
filing systems where partitioning like that was required to quickly
locate a file in a paper archive/library.

If we partition on invoice_date only, there is an implication that
people will search for invoices on date range only too, otherwise why
not just partition on invoice_id. This still works with the compound key
approach.

> (But that sounds rather like pie in the sky, actually. Which other
> databases can do that, and how do they do it?)

Oracle does it, by building a big index. Few people use it.

There are significant problems with this idea that I have already
raised:
- how big would the index be?
- how would you add and remove partitions with any kind of performance?
If we partitioned on date range, that will surely increase over time.
- the index could almost certainly never be REINDEXed because of space
requirements and time considerations.
- if the indexed values were monotonically increasing the RHS of the
index would become a significant hotspot in load performance, assuming
high volume inserts into a large table

My argument is that there are significant real-world disadvantages to
having this feature, yet there exists a reasonable workaround to avoid
ever needing it. Why would we spend time building and supporting it?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2007-04-06 07:13:34 Re: Postgres optimizer
Previous Message Marko Kreen 2007-04-06 06:58:43 Re: What X86/X64 OS's do we need coverage for?

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2007-04-06 07:40:48 Re: Auto Partitioning
Previous Message Greg Smith 2007-04-06 06:53:17 Re: Load distributed checkpoint V3