Quick Links

Table Partitioning, Part 1

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	bizgres-general <bizgres-general(at)pgfoundry(dot)org>
Subject:	Table Partitioning, Part 1
Date:	2005-05-09 22:30:58
Message-ID:	1115677858.3830.131.camel@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

A more in-depth consideration of the major design options and trade-offs
available to us... this is an internals related discussion.

Comments?

1. Embellish inheritance or separate infrastructure?

Inheritance is a somewhat strange PostgreSQL feature, with a few
gotchas. The "big one" is the lack of a multi-relation index that can be
used against all of the inherited relations also.
The full complexity of multiple inheritance, possible multi-level
inheritance doesn't seem to solve any problems for us. The main argument
in favour of improving inheritance as a route to partitioning is that
many people have already developed applications using it to provide some
of the desirable aspects of partitioning. The amount of code used to
provide some of these features is actually quite low, and the code
itself is already specialised.

It seems prudent to avoid building on that foundation, even though we
may decide to use some similar approaches.

2. Individual Relations explicitly in the plan or MultiRelation plan
nodes? (i.e. is a SeqScan of a Partitioned Table one Node or many
nodes?)

Currently, Inheritance produces a plan for a SeqScan that looks like
this...

Append
SeqScan
SeqScan
SeqScan

ISTM fairly straightforward to produce a similar "static" plan along the
same lines, using Result nodes to implement Partition Elimination.

Append
Result
SeqScan
Result
SeqScan
Result
SeqScan

Some may think that looks a little clunky, especially since the plan
length would increase as the number of Partitions increased. An
alternative, would be to have a single Scan node that performs a Multi-
Partition Scan, which looks like this...

PartitionSeqScan

There'd always one node, no matter how many partitions. Again, easy to
implement, since we just add a new type of Scan access method, or alter
the existing ones iff we access partitioned tables. We'd be able to use
physical tlists, and all the complexity is hidden.

Both solve the basic case of Direct restriction of Partitioning Key
(PPUC1)

The multi-scan node stops us from doing a mixed Scan plan:

Append
Result
IndexScan
Result
SeqScan
Result
SeqScan

Why? The node acts at execution time, though the planner is the usual
place to decide between scan types.

But why do we need that? Why not just make all partitions the same size,
give them all the same Statistics etc.. Well, there are Use Cases that
strongly suggest that partition size could vary considerably (PPUC1) and
also that there is some incentive to have tables that have indexes on
the most recent partitions yet none on the older partitions (PPUC5.2)

My current thinking is to implement explicitly partition plans.

But what about join plans? There are Use Cases that depend almost
entirely upon joins for Partition Elimination rather than direct
restriction of the PPK. (PPUC2)

How could a join plan be performed with the explicit plan shown above?
The only option would seem to be a Sort/Merge, which could be a disaster
if we only wanted a few rows returned from a very large table.

If all partitions in the query had identical indexes on them, then we
have another option. In that case, each index could be thought to form
part of a larger index ordered initially on the Partitioning Key (PPK).
If the first column was actually the PPK, then the set of indexes would
be exactly equivalent to a Global Index. We can call this a Pseudo
Global Index.

The Pseudo Global Index could also be used to enforce uniqueness. If all
of the composite indexes were defined unique and the index contained the
PPK as one of its columns, this would work. The index enforces
uniqueness within each partition and the PPK enforces uniqueness across
partitions because the same PPK value cannot be in two partitions.

We can also use the Pseudo Global Index as the object of a multi-index
scan node, since the PGI would be ordered on the PPK. The plan node
would use the partition boundary definitions to locate the starting
partition, then seek within the partition to the correct part of the
index then scan sideways, skipping to the next partition as required.

This sounds a great idea, but it is exactly the technique I argued
against only 4 paragraphs ago for use with SeqScans.

My current thinking is that we need a mix of the two approaches, though
I raise all of this here to gain everybody's input...

3. Partitions are relations? Or sub-relations?

If we ever want to implement explicit plans, as discussed above, ISTM
that we should consider partitions to be full relations.

If partitions become sub-relations then we complicate many other parts
of the machinery in the planner. This would also complicate many other
things, since each rel currently has a single relfilenode for almost
every aspect.

We might think about doing this as a way of making "Global Indexes". If
its all one relation, we already have them! But that would only work for
indexes defined with the PPK as the leading column because otherwise we
would have no way of adding or dropping new partitions without ripping
the guts out of each index (and failing PPUC4.2 as a result). It turns
out that we can have roughly the same kind of Pseudo Global Index
whether we consider partitions to be relations or sub-relations.

It seems much simpler to have partitions as relations...

4. Should all partitions have the same indexes?

It also seems that we have somewhat opposing Use Cases: PPUC2 requires
joins to achieve Partition Elimination, PPUC 5.2 requires us to have no
indexes on older partitions (since they reside on tape).

Some people will say that having all indexes defined the same eases
maintenance. Others will say it gives us flexibility.

I would argue in favour of flexibility now and automation later.

5. Constraints or specific Partitioning syntax?

Partition Elimination relies upon being able to prove at execution time
that two clauses in a query restriction are always false when taken
together.

Taken generically, extending the concept of Partitioning to the max, we
might imagine that we could take any number of constraints, then AND
them together with an arbitrarily complex restriction clause, solve it
and then eliminate partitions. That seems too much work to solve that
generically and provably. I aim for an easier subset, with a more
bounded solution algorithm.

If we know which constraints are to be used for Partition Elimination,
and we make some restrictions on the types of restricion clauses that
can take part in Partition Elimination, then the task gets easier.

So, I think this requires explicit syntax, even though I am very
attracted to minimal syntax implementations.

Anyway, that was a bit of a ramble, so thanks for reading this far.

Best regards, Simon Riggs

Responses

Re: Table Partitioning, Part 1 at 2005-05-09 22:53:52 from Tom Lane
Re: Table Partitioning, Part 1 at 2005-05-10 13:31:55 from Hannu Krosing

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2005-05-09 22:53:52	Re: Table Partitioning, Part 1
Previous Message	Tom Lane	2005-05-09 21:44:08	Re: Oracle Style packages on postgres