Re: On partitioning

From: "Amit Langote" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Claudio Freire'" <klaussfreire(at)gmail(dot)com>
Cc: "'Josh Berkus'" <josh(at)agliodbs(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-12-15 02:12:52
Message-ID: 001601d0180c$a212f550$e638dff0$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Alvaro wrote:
> Claudio Freire wrote:
>
> > Fair enough, but that's not the same as not requiring easy proofs. The
> > planner might not the one doing the proofs, but you still need proofs.
> >
> > Even if the proving method is hardcoded into the partitioning method,
> > as in the case of list or range partitioning, it's still a proof. With
> > arbitrary functions (which is what prompted me to mention proofs) you
> > can't do that. A function works very well for inserting, but not for
> > selecting.
> >
> > I could be wrong though. Maybe there's a way to turn SQL functions
> > into analyzable things? But it would still be very easy to shoot
> > yourself in the foot by writing one that is too complex.
>
> Arbitrary SQL expressions (including functions) are not the thing to use
> for partitioning -- at least that's how I understand this whole
> discussion. I don't think you want to do "proofs" as such -- they are
> expensive.
>

This means if a user puts arbitrary expressions in a partition definition, say,

... FOR VALUES extract(month from current_date) TO extract(month from current_date + interval '3 months'),

we make sure that those expressions are pre-computed to literal values. The exact time when that happens is open for discussion I guess. It could be either DDL time or, if feasible, during relation cache building when we compute the value from pg_node_tree of this expression which we may choose to store in the partition definition catalog. The former entails an obvious challenge of figuring out how we store the computed value into catalog (pg_node_tree of a Const?).

> To make this discussion a bit clearer, there are two things to
> distinguish: one is routing tuples, when an INSERT or COPY command
> references the partitioned table, into the individual partitions
> (ingress); the other is deciding which partitions to read when a SELECT
> query wants to read tuples from the partitioned table (egress).
>
> On ingress, what you want is something like being able to do something
> on the tuple that tells you which partition it belongs into. Ideally
> this is something much lighter than running an expression; if you can
> just apply an operator to the partitioning column values, that should be
> plenty fast. This requires no proof.
>

And I am thinking this's all executor stuff.

> On egress you need some direct way to compare the scan quals with the
> partitioning values. I would imagine this to be similar to how scan
> quals are compared to the values stored in a BRIN index: each scan qual
> has a corresponding operator strategy and a scan key, and you can say
> "aye" or "nay" based on a small set of operations that can be run
> cheaply, again without any proof or running arbitrary expressions.
>

My knowledge of this is far from being perfect, though to clear any confusions -

As far as planning is concerned, I could not imagine how index access method way of pruning partitions could be made to work. Of course, I may be missing something.

When you say "scan qual has a corresponding operator strategy", I'd think that is a part of scan key in executor, no?

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-12-15 02:20:40 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Michael Paquier 2014-12-15 02:08:29 Re: moving from contrib to bin