Re: [patch] plproxy v2

From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Postgres Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] plproxy v2
Date: 2008-07-08 15:29:10
Message-ID: e51f66da0807080829t310c1ad5p3784e9543a20b6ff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/8/08, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote:
> > I mentioned that I planned to remove SELECT/CONNECT too.
> > Now I've thought about it more and it seems to me that its better
> > to keep them. As they give additional flexibility.
>
> I very much like PL/Proxy and support your vision. Including the
> features of PL/Proxy in core seems like a great idea to me.
>
> If we have just a couple of commands, would it be easier to include
> those features by some additional attributes on pg_proc? That way we
> could include the features in a more native way, similar to the way we
> have integrated text search, without needing a plugin language at all.
>
> CREATE CLUSTER foo ...
>
> CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ...
>
> If we did that, we might also include a similar proxy feature for
> tables, making the feature exciting for more users than just those who
> can specify implementing all logic through functions. It would also
> remove the need for a specific SELECT command in PL/Proxy.
>
> CREATE TABLE bar CLUSTER foo RUN ON ANY ...
>
> If we're running a SELECT and all tables accessed run on the same
> cluster we ship the whole SQL statement according to the RUN ON clause.
> It would effectively bring some parts of dblink into core.
>
> If all tables not on same cluster we throw an error in this release, but
> in later releases we might introduce distributed join features and full
> distributed DML support.
>
> Having the PL/Proxy features available via the catalog will allow a
> clear picture of what runs where without parsing the function text. It
> will also allow things like a pg_dump of all objects relating to a
> cluster.
>
> Adding this feature for tables would be interesting with Hot Standby,
> since it would allow you to offload SELECT statements onto the standby
> automatically.
>
> This would be considerably easier to integrate than text search was.

Interesting proposal.

First I want to say - we can forget the SELECT/CONNECT statements
when discussing this approach. They are in because they were easy
to add and gave some additional flexibility. But they are not important.
If they don't fit some new approach, there is no problem dropping them.

So that leaves functions in form:

CLUSTER <expr>;
RUN ON <expr>;

and potentially SPREAD BY as discussed in:

http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html

which sends different arguments to different partitions. I'm not yet
sure it's worthwhile addition, but I work mostly on OLTP databases
and that feature would target OLAP ones. So I let others decide.

Now few technical points about your proposal:

- One feature that current function-based configuration approach gives
is that we can manage cluster configuration centrally and replicate
to actual proxy databases. And this is something I would like to keep.

This can be solved by using also plain table or functions behind
the scenes.

- How about CREATE REMOTE FUNCTION / TABLE .. ; for syntax?

- Currently both hash and cluster selection expressions can be
quite free-form. So parsing them out to some pg_proc field
would not be much help actually.

And some philosophical points:

- PL/Proxy main use-case is complex read-write transactions
in OLTP setting. But remote table/views target simple
read-only transactions with free-form queries.

- PL/Proxy has concrete argument list and free-form cluster
and partition selection. Remote tables have free-form
arguments, maybe they want more rigid cluster / partition
selection?

If the syntax and backend implementation can be merged, its good,
but it should not be forced. So before we start adding syntax
to core, maybe it would be good to have concrete idea how the remote
tables will look like and what representation they want for a cluster?

Especially if you want to do stuff like distributed joins.

OTOH, if you say that current PL/Proxy approach fits remote tables
as well, I'm not against doing it SQL level.

--
marko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2008-07-08 15:38:35 Re: Exposing quals
Previous Message Joshua D. Drake 2008-07-08 15:24:30 Re: [patch] plproxy v2