Re: Partitions implementation with views

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Partitions implementation with views
Date: 2003-12-01 16:15:47
Message-ID: 878ylwecp8.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jonathan Gardner <jgardner(at)jonathangardner(dot)net> writes:

> Usually, it is data that is related to each other somehow. I guess an
> example would be for an internet host who wants to provide a common
> shopping cart functionality for all of its customers, but put their
> specific data on a specific partition for ease of management and query
> speed. They can't put the data into seperate databases because they also
> need to examine the complete data set occasionally.

The most commonly cited example is date-based partitions. Separating financial
records by fiscal year for example. So for example the current year may be on
the super-fast raid 0+1 15k rpm SCSI disks, but the previous 7 years may be on
your slow but big raid5 farm of IDE drives. And at year-end you want to create
a new year, drop all the hundreds of gigabytes of data from the 7th year from
the archive quickly, and move the current year to the archive.

> The common response is "Use partial indexes". But I imagine that they want
> the ability to move partitions onto seperate OS partitions (hence the name,
> "partition").

Separate OS partitions is a reasonable use of partitioned tables, but the
biggest advantage is being able to drop and load partitions very quickly, and
without impacting performance at all. loading or dropping millions of records
becomes a simple DDL operation instead of a huge DML operation.

...

> Now that view is the partitioned table for all intents and purposes. The
> partition tables are the partitions themselves.
>
> Is this what they are looking for, or is it something completely different?

That's the idea, though the purpose of having it as a built-in feature is to
hide all the details you're describing from the user. The user can already do
all the above if they wanted to.

And there's more work to do:

1) The optimizer needs to know about the partitions to be able to check the
query to see if it needs all the partitions or only a small subset. Often
partitioned tables are used when most queries only need a single partition
which drastically affects the costs for plans.

2) You want to be able to manage indexes across all the partitions in a single
operation. Any index on the table where the leading columns of the index
are the partition key can automatically create a separate index on every
table.

3) You want DDL commands to move around the partitions in various ways.
Creating new partitions, moving tables into the partitioned table and
moving partitions out into separate tables of their own. splitting a
partition into two partitions, merging two into one, etc.

4) I always managed to avoid them, and they don't seem very useful to me, but
Oracle also supports "global indexes" which are indexes that span all the
partitions without having the partition key as the leading columns.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2003-12-01 16:16:56 Re: initdb should create a warning message [was Re:
Previous Message Stephan Szabo 2003-12-01 16:09:03 Re: Max number of rows in a table