Re: Federated Postgresql architecture ?

From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Chris Browne" <cbbrowne(at)acm(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Federated Postgresql architecture ?
Date: 2008-06-30 13:16:26
Message-ID: e51f66da0806300616q47a3433bt52f78716427c5665@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 6/27/08, Chris Browne <cbbrowne(at)acm(dot)org> wrote:
> josh(at)agliodbs(dot)com (Josh Berkus) writes:
> > Jonah,
> >
> >> Hmm, I didn't think the Skype tools could really provide federated
> >> database functionality without a good amount of custom work. Or, am I
> >> mistaken?
> >
> > Sure, what do you think pl/proxy is for?
>
>
> Ah, but the thing is, it changes the model from a relational one,
> where you can have fairly arbitrary "where clauses," to one where
> parameterization of queries must be predetermined.
>
> The "hard part" of federated database functionality at this point is
> the [parenthesized portion] of...
>
> select * from table(at)node [where criterion = x];
>
> What we'd like to be able to do is to ascertain that [where criterion
> = x] portion, and run it on the remote DBMS, so that only the relevant
> tuples would come back.
>
> Consider...
>
> What if table(at)node is a remote table with 200 million tuples, and
> [where criterion = x] restricts the result set to 200 of those.
>
> If you *cannot* push the "where clause" down to the remote node, then
> you're stuck with pulling all 200 million tuples, and filtering out,
> on the "local" node, the 200 tuples that need to be kept.
>
> To do better, with pl/proxy, requires having a predetermined function
> that would do that filtering, and if it's missing, you're stuck
> pulling 200M tuples, and throwing out nearly all of them.
>
> In contrast, with the work David Fetter's looking at, the [where
> criterion = x] clause would get pushed to the node which the data is
> being drawn from, and so the query, when running on "table(at)node,"
> could use indices, and return only the 200 tuples that are of
> interest.
>
> It's a really big win, if it works.

I agree that for doing free-form queries on remote database,
the PL/Proxy is not the right answer. (Although the recent patch
to support dynamic records with AS clause at least makes them work.)

But I want to clarify it's goal - it is not to run "pre-determined
queries." It is to run "pre-determined complex transactions."

And to make those work in a "federated database" takes huge amount
of complexity that PL/Proxy simply sidesteps. At the price of
requiring function-based API. But as the function-based API has
other advantages even without PL/Proxy, it seems fine tradeoff.

--
marko

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jonah H. Harris 2008-06-30 13:34:27 Re: Federated Postgresql architecture ?
Previous Message Moritz Onken 2008-06-30 12:56:57 Re: Planner should use index on a LIKE 'foo%' query