Re: [rfc,patch] PL/Proxy in core

From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Steve Singer" <ssinger_pg(at)sympatico(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [rfc,patch] PL/Proxy in core
Date: 2008-05-18 05:44:29
Message-ID: e51f66da0805172244v3f4df204te5a26852323a2d79@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/18/08, Steve Singer <ssinger_pg(at)sympatico(dot)ca> wrote:
> On Sat, 17 May 2008, Marko Kreen wrote:
> > On 5/17/08, Steve Singer <ssinger_pg(at)sympatico(dot)ca> wrote:
> > > Somewhat unrelated, I can see use-cases for replacing the call to
> random()
> > > with something that allows user defined polices for RUN ON ANY.
> >
> > Well, thats why the RUN ON userfunc(..); exists. Also notice the function
> > can tag more that one partition for execution.
> >
> > Or did you mean something else than partition selection by "user
> > defined policy"?
>
> I see RUN ON userfunc() as being for partitioning where the correctness
> requires that the query be run on the result of userfunc. I see RUN ON ANY
> as being for load-balancing.

Here you see wrong. You should see RUN ON ANY simply as a shortcut
for RUN ON random(); The actual random() would not work as it returns
floats, but equivalent integer random();

So if you want smarter ANY, just implement your function. I don't see
any need for tunable ANY.

> You might want to RUN ON ANY with a round
> robin balancing, or maybe consider the load of servers for doing the
> balancing.
>
> In the case of RUN ON ANY it seems that the database the query gets sent to
> doesn't matter. It might make sense for plproxy to try the next database if
> it can't connect to the first one it picks. You wouldn't want this for
> partitioning queries. If plproxy knows if you mean 'the query has to be run
> on these partitions' versus 'run the query on any partition, using method x
> to choose' then that type of things would be possible.

Ok, here are 2 feature requests, that we have thought ourselves too:

RUN ON LEAST LOADED;

Sorry, this is unimplementable with current PL/Proxy design, as the
per-backend PL-s do not coordinate their usage. And this is deliberate.

If you want to implement this the design should look exactly like
PL/Proxy 1 - each PL does special connection to special pooler that
is responsible for partition selection and thus has information
about partition usage. And the complexity went through the roof...

You may achieve the same effect with smart tcp proxy or if not you
can write load-balancing feature with load check for PgBouncer.

RUN ON ANY PICK NEXT ON ERROR;

This is implementable. But we have not found an actual need for it
ourselves. So I have bothered to implement it as otherwise plproxy
would have another "implementable" and "maybe nice to have" feature
without actual reason like CONNECT, SELECT and get_cluster_config()
turned out to be.

OTOH, here we don't use read-only load balancing much. And such feature
does not make sense when partitioning is used. But it indeed makes
sense for load-balancing. So I'm not against adding it.

--
marko

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-05-18 06:53:33 Re: [HACKERS] use of pager on Windows psql
Previous Message Tom Lane 2008-05-18 04:59:50 Re: Link requirements creep