Re: What needs to be done for real Partitioning?

From: "Stacy White" <harsh(at)computer(dot)org>
To: "Alvaro Herrera" <alvherre(at)dcc(dot)uchile(dot)cl>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PFC" <lists(at)boutiquenumerique(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: What needs to be done for real Partitioning?
Date: 2005-03-20 19:59:29
Message-ID: 002a01c52d87$543d1010$0200a8c0@grownups
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > We probably also need multi-table indexes.
> As Josh says, that seems antithetical to the main point of partitioning,
> which is to be able to rapidly remove (and add) partitions of a table.
> If you have to do index cleaning before you can drop a partition, what's
> the point of partitioning?

Global indexes (as opposed to partition local indexes) are useful in cases
where you have a large number of partitions, index columns different than
the partition key, and index values that limit the query to just a subset of
the partitions.

The two domains that I'm most familiar with are warehouse management, and
the film industry. In both these cases it's logical to partition on
day/week/month, it's frequently important to keep a lot of history, and it's
common to have products that only show activity for a few months. In one of
our production systems we have 800 partitions (by week, with a lot of
history), but a popular product might have only 20 weeks worth of activity.
Selecting records for the product requires at least 800 random-access reads
if you have local indexes on 'product_no', 780 of which just tell the
executor that the partition doesn't include any information on the product.

This is definitely a phase II item, but as I said before it's worth
considering since good DBAs can do a lot with global indexes.

FWIW, we see large benefits from partitioning other than the ability to
easily drop data, for example:

- We can vacuum only the active portions of a table
- Postgres automatically keeps related records clustered together on disk,
which makes it more likely that the blocks used by common queries can be
found in cache
- The query engine uses full table scans on the relevant sections of data,
and quickly skips over the irrelevant sections
- 'CLUSTER'ing a single partition is likely to be significantly more
performant than clustering a large table

In fact, we have yet to drop a partition on any of our Oracle or Postgres
production systems.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2005-03-20 20:03:41 Re: What needs to be done for real Partitioning?
Previous Message Tom Lane 2005-03-20 17:58:30 Re: What needs to be done for real Partitioning?