Re: Named vs Unnamed Partitions

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "Andrew Sullivan" <ajs(at)crankycanuck(dot)ca>
Subject: Re: Named vs Unnamed Partitions
Date: 2008-01-09 02:25:44
Message-ID: 87odbvwx93.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Markus Schiltknecht" <markus(at)bluegap(dot)ch> writes:

> There are two very distinct ways to handle partitioning. For now, I'm calling
> them named and unnamed partitioning.

I had most of a draft email written which I seem to have lost in a reboot. To
a large degree I was on the same line of thought as you.

The whole point of partitioning is to give the DBA a short-hand to allow him
or her to describe certain properties of the data to the database.

The "named" approach is to let the DBA create objects which can then have
various properties attached to them. So you can create a bucket for each month
or for each financial account or whatever. Then you can attach properties to
the buckets such as what tablespace to store them in, or whether to treat them
as read-only or offline.

The naming is precisely the useful part in that it is how the DBA associates
the properties with chunks of data.

Without naming the DBA would have to specify the same ranges every time he
wants to change the properties. He might do a "SET read_only WHERE created_on
< '2000-01-01'" one day then another "SET tablespace tsslow WHERE created_on <
'2000-01-01'" and then later again do "SET offline WHERE created_on <
'2000-01-01'"

I have to admit I always found it kludgy to have objects named
invoices_2000_JAN and invoices_2000_FEB and so on. It's kind of an meta
denormalization. But so is specifying where clauses repeatedly.

If you don't have a first-class object which you can refer to to attach
properties to, and instead are forced to redefine it repeatedly for each use
then there's nothing stopping you from creating overlapping or even
conflicting sets of properties.

What's the database to do if you tell it something like:

ALTER TABLE foo SET tablespace tsslow WHERE created_on < '2000-01-01'
ALTER TABLE foo SET tablespace tsfast WHERE updated_on > '2006-01-01'

Maybe you know that no record older than 2000 will be updated now but the
database doesn't.

As Markus describes too the behaviour *before* you've attached any particular
properties to a partition is interesting too. A big benefit of partitioning is
being able to load whole partitions or drop whole partitions of data which
were not in any way special prior to needing to be archived. Effectively the
named objects are the DBA's way of telling the database "this chunk of data
here, keep it all in one place because I'll be doing something en masse to it
(such as dropping it) at some later date".

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-01-09 02:33:03 Re: Index trouble with 8.3b4
Previous Message Bruce Momjian 2008-01-09 02:21:00 Re: Problem with CVS HEAD's handling of mergejoins