Re: Query performance problems with partitioned tables

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Guillaume Cottenceau" <gc(at)mnc(dot)ch>
Cc: "Andreas Haumer" <andreas(at)xss(dot)co(dot)at>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query performance problems with partitioned tables
Date: 2007-04-30 15:35:55
Message-ID: 87slah98o4.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Guillaume Cottenceau" <gc(at)mnc(dot)ch> writes:

> I think this is the last claimed point which is incorrect. Pg has
> no general guarantee the partitions actually create a disjoint
> set, even with the CHECK constraints. Pg can only optimize by
> avoiding scanning the partitions inside which no satisfactory
> data could be found by the CHECK constraint, but I think it's not
> possible (too complicated) to infer that any found row in your
> other partitions would not be in the final resultset because of
> 1. the query's resultset order 2. the limit 3. the actual
> conditions in the CHECK constraints (there is no direct way to
> see that timestamps in your 200704 partition are greater than
> timsteamp in your 200601 partition).

I think the answer is that yes there are a number of query transformations
that could be done for partitioned tables that we're not doing currently.

Generally speaking we need to look at each type of plan node and figure out
whether it can usefully be pushed down below the Append node and how we can
determine when that can be done.

So if each arm of the Append was already in order we could easily push the
LIMIT inside the Append (and duplicating the work above the Append). But that
wouldn't save us any work because we only generate rows from the partitions as
they're needed anyways.

In your example we could save work because the Sort needs all the rows before
it starts. So we could sort each arm and apply the limit before passing it up
to the outer Sort. That might save work in this case.

But figuring out when that's less work than just sorting all of them is
tricky. If the LIMIT is 1,000 and you have ten arms of 1,000 tuples each then
you aren't going to save any work doing this. But if the LIMIT is 1 and you
have few arms with many tuples then you could save a lot of work.

Actually I happen to have been reading up on algorithms related to this this
weekend. It's possible to implement a LimitUnsorted in linear-time which would
fetch the first n records according to some sort key without actually sorting
the records. That might make it more worthwhile.

In short. Yes, there are a lot of optimizations possible around partitioned
tables that we don't do either because it's not clear how to tell when they're
worthwhile or because the code just isn't there yet.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig A. James 2007-04-30 16:18:51 Re: Feature Request --- was: PostgreSQL Performance Tuning
Previous Message Tom Lane 2007-04-30 15:06:47 Re: Query performance problems with partitioned tables