Re: Partitioned table performance

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Greg Stark <gsstark(at)mit(dot)edu>, "Stacy White" <harsh(at)computer(dot)org>
Subject: Re: Partitioned table performance
Date: 2004-12-15 22:53:19
Message-ID: 87vfb3p40w.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Josh Berkus <josh(at)agliodbs(dot)com> writes:

> > But I'm a bit puzzled. Why would Append have any significant cost? It's
> > just taking the tuples from one plan node and returning them until they run
> > out, then taking the tuples from another plan node. It should have no i/o
> > cost and hardly any cpu cost. Where is the time going?
>
> Beats me. Tom?
>
> > In my experience "global indexes" defeat the whole purpose of having the
> > partitions. They make dropping and adding partitions expensive which was
> > always the reason we wanted to partition something anyways.
>
> Hmmm. Possibly, I was just thinking about the cost to partitioned tables
> when you do a selection *not* on the partitioned axis. Also that currently
> we can't enforce UNIQUE constraints across partitions.

Like I said though, we found "global indexes" defeated the whole purpose. That
meant no global UNIQUE constraints for us when we went to partitioned tables.
It gave the DBAs the willies but it really wasn't a big deal.

You can still do unique local indexes on a specific partition. So as long as
your partition key is in the primary key you can have a trustworthy primary
key.

And even if not, you usually find you're only loading data into only one
partition. In most applications it's pretty hard to get a record from two
different partitions with conflicting IDs and not hard to check for. You could
easily put a constraint saying that all PO numbers in the new fiscal year have
to be greater than the last PO number from last year, for example.

> But maybe reducing the cost of Append is the answer to this.

The problem with global indexes is that adding or removing an entire partition
becomes a large job. [Actually with Postgres MVCC I suppose removing might
not. But cleaning up would eventually be a large job, and the point remains
for adding a partition.]

Ideally adding and removing a partition should be a O(1) operation. No data
modification at all, purely catalog changes.

> > It is handy having a higher level interface to deal with partitioned
> > tables. You can create a single "local" or "segmented" index and not have
> > to manually deal with all the partitions as separate tables. But that's
> > just syntactic sugar.
>
> Right, and the easy part.

I think the hard part lies in the optimizer actually. The semantics of the
operations to manipulate partitions might be tricky to get right but the
coding should be straightforward. Having the optimizer be able to recognize
when it can prune partitions will be a lot of work.

--
greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Theo Galanakis 2004-12-15 23:48:06 indentifying the database in a Postgres log file.
Previous Message Josh Berkus 2004-12-15 19:56:40 Re: Partitioned table performance