Re: Constraint Exclusion (Partitioning) - Initial Review

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Constraint Exclusion (Partitioning) - Initial Review
Date: 2005-07-03 22:51:26
Message-ID: 1120431087.3940.103.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Sat, 2005-07-02 at 15:56 -0400, Bruce Momjian wrote:
> Seems you have managed to combine inheritance, check constraints, and
> partial index into table partitioning. It is nice it requires no new
> syntax. Here is an example from your tests:
>
> DROP TABLE pm cascade;
> CREATE TABLE pm
> ( dkey INT NOT NULL
> );
>
> CREATE TABLE p1 ( CHECK (dkey BETWEEN 10000 AND 19999)) INHERITS (pm);
> CREATE TABLE p2 ( CHECK (dkey BETWEEN 20000 AND 29999)) INHERITS (pm);
> CREATE TABLE p3 ( CHECK (dkey BETWEEN 30000 AND 39999)) INHERITS (pm);
>
> So, in this case, a SELECT from pm would pull from the three base
> tables, and those tables can be located on different tablespaces, and
> the backend will only look in child tables that might contain rows basd
> on the check constraints. It is that last phrase that is the new
> functionality here.

Yes, dead on. Thank you for this elegant summary. The main idea was
originally Hannu Krosing's, I believe, with suggestion from Tom to
enhance the partial index machinery to this end.

So a query such as

select * from pm where dkey = 25000

will have an EXPLAIN that completely ignores p1 and p3, since these can
be provably excluded from the plan without effecting the result.

I see the "no syntax" version as the first step towards additional
functionality that would require additional syntax.

> Oh, why would someone want to set enable_constraint_exclusion to false?

The included functionality performs the exclusion at plan time. If a
query was prepared for later execution, it *could* return the wrong
answer when the plan was executed at a later time since plans are not
invalidated when constraints change. So, in general, this should be set
to false except for circumstances where the user can guarantee no such
mistake would be made.

> You had a few questions:
>
> > Main questions:
> > 1. How should we handle the case where *all* inherited relations are
> > excluded? (This is not currently covered in the code).
>
> I assume this means we don't return any rows. Why it is an issue?

A code question only. No issue, just how should the code look?

> > 2. Should this feature be available for all queries or just inherited
> > relations?
>
> I don't see why other queries should not use this. Our TODO already
> has:
>
> * Use CHECK constraints to influence optimizer decisions
>
> CHECK constraints contain information about the distribution of values
> within the table. This is also useful for implementing subtables where
> a tables content is distributed across several subtables.
>
> and this looks like what you are doing. However, again, I see the
> constraint as just informing whether there might be any rows in the
> table. Am I missing something? Are you thinking views with UNION could
> benefit from this?

In general, it seems you might want this. In normal use check
constraints tend to be on minor columns, not key columns. Queries that
would be provably able to exclude tables based upon this would be
strange queries.

i.e.
select count(distinct item_pk) from warehouse where quantity < 0

is not a very common query. So we would pay the overhead of checking for
exclusion for all queries when only a few wierd ones would ever take
advantage of it. Sounds like a poor trade-off to me.

IMHO, the only time you might expect to see benefit is when you have
many similar tables that are partitioned by design into pieces that lend
themselves to exclusion. If you specifically designed a set of tables
and used UNION to bring them together, then I can see that you would
want it then also.... but is there any benefit in supporting two
different ways of achieving the same basic design: partitioned table.

> > 3. And should we extend RelOptInfo to include constraint information?
>
> Is the problem that you have to do multiple lookups without it?

> > 4. Do we want to integrate the test suite also?
>
> No, once this thing works, I don't see it getting broken frequently. We
> usually don't test the optimizer, but we could add a single test for one
> row in each table.
>
> > 5. Presumably a section under Performance tips would be appropriate to
> > document this feature? (As well as section in run-time parameters).
>
> Yep.
>
> I am surprised no one else has commented on it, which I think means your
> code is ready for the queue. Do you want to adjust it based on this
> feedback or should I apply and you can adjust it later.

I think that it means I did not explain myself well enough. :-)

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-07-04 01:38:04 Re: bug in ALTER TABLE / TYPE
Previous Message Simon Riggs 2005-07-03 19:06:55 Re: Constraint Exclusion (Partitioning) - Initial Review

Browse pgsql-patches by date

  From Date Subject
Next Message Petr Jelinek 2005-07-03 23:08:05 Re: per user/database connections limit again
Previous Message Simon Riggs 2005-07-03 19:06:55 Re: Constraint Exclusion (Partitioning) - Initial Review