Re: Declarative partitioning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Amit Langote <amitlangote09(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning
Date: 2016-04-15 04:35:42
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Ashutosh,

On 2016/04/14 21:34, Ashutosh Bapat wrote:
> Hi Amit,
> Here are some random comments. You said that you were about to post a new
> patch, so you might have already taken care of those comments. But anyway
> here they are.

Thanks a lot for the comments. The patch set changed quite a bit since the
last version. Once the CF entry was marked returned with feedback on
March 22, I held off sending the new version at all. Perhaps, it would
have been OK. Anyway here it is, if you are interested. I will create an
entry in CF 2016-09 for the same.

Also, see below replies to you individual comments.

> 1. Following patches do not apply cleanly for me.
> 0001
> 0003 - conflicts at #include for partition.h in rel.h.
> 0004 - conflicts in src/backend/catalog/Makefile
> 0005 - conflicts in src/backend/parser/gram.y

These should be fixed although the attached one is a significantly
different patch set.

> 2. The patches are using now used OIDs 3318-3323. Corresponding objects
> need new unused oids.

Right, fixed.

> 3. In patch 0001-*, you are using indkey instead of partkey in one of the
> comments and also in documentation.


> 4. After all patches are applied I am getting several errors like "error:
> #error "lock.h may not be included from frontend code", while building
> rmgrdesc.c. This
> seems to be because rel.h includes partition.h, which leads to inclusion of
> lock.h in rmgrdesc.c. Are you getting the same error message? It looks like
> we need separate header file for declaring function which can be used at
> the time of execution, which is anyway better irrespective of the compiler
> error.

Yeah, I too am beginning to feel that dividing partition.h into separate
headers would be a good idea in long term. For time being, I have managed
to get rid of partition.h #included in rel.h which solves the issue.

> 5. In expand_partitioned_rtentry(), instead of finding all the leaf level
> relations, it might be better to build the RTEs, RelOptInfo and paths for
> intermediate relations as well. This helps in partition pruning. In your
> introductory write-up you have mentioned this. It might be better if v1
> includes this change, so that the partition hierarchy is visible in EXPLAIN
> output as well.
> 6. Explain output of scan on a partitioned table shows Append node with
> individual table scans as sub-plans. May be we should annotate Append node
> with
> the name of the partitioned table to make EXPLAIN output more readable.

I got rid of all the optimizer changes in the new version (except a line
or two). I did that by switching to storing parent-child relationships in
pg_inherits so that all the existing optimizer code for inheritance sets
works unchanged. Also, the implementation detail that required to put
only leaf partitions in the append list is also gone. Previously no
storage was allocated for partitioned tables (either root or any of the
internal partitions), so it was being done that way.

Regarding 6, it seems to me that because Append does not have a associated
relid (like scan nodes have with scanrelid). Maybe we need to either fix
Append or create some enhanced version of Append which would also support
dynamic pruning.

> 7. \d+ output of partitioned table does not show partitioning information,
> something necessary for V1.

This has been fixed in the attached.

> 8. Instead of storing partition key column numbers and expressions
> separately, can we store everything as expression; columns being a single
> Var node expression? That will avoid constructing Var nodes to lookup in
> equivalence classes for partition pruning or join optimizations.

Hmm, let me consider that. FWIW, if you look at my proposed patch (or
description thereof) back in CF 2015-11 [1], you will see that I had
modeled matching clauses to partition key on lines of how similar
processing is done for indexes (as in how indexes are matched with clauses
- down all the way to match_clause_to_indexcol()).

None of that exists in the current patch set but when we get to that
(optimizer patch that is), I perhaps wouldn't do it radically differently.
I admit however that I hadn't thought really hard about join optimization
stuff so I may be missing something.

> 10. The code distinguishes between the top level table and its partitions
> which in turn are partitioned. We should try to minimize this distinction
> as much as possible so as to use recursive functions for operating on
> partitions. E.g. each of the partitioned partitions may be labelled
> RELKIND_PARTITIONED_REL? A 0/NULL parent would distinguish between root
> partition and child partitions.

Exactly how it's done in the attached. Any table that is partitioned is



Attachment Content-Type Size
0001-Add-syntax-to-specify-partition-key-v1.patch text/x-diff 41.5 KB
0002-Infrastructure-for-creation-of-partitioned-tables-v1.patch text/x-diff 85.9 KB
0003-Add-syntax-to-create-partitions-v1.patch text/x-diff 78.0 KB
0004-Infrastructure-for-partition-metadata-storage-and-ma-v1.patch text/x-diff 94.5 KB
0005-Introduce-tuple-routing-for-partitioned-tables-v1.patch text/x-diff 27.8 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-04-15 04:45:00 Re: Support for N synchronous standby servers - take 2
Previous Message Petr Jelinek 2016-04-15 04:28:14 Re: Pglogical questions and problems