Skip site navigation (1) Skip section navigation (2)

Re: Federated Postgresql architecture ?

From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Chris Browne" <cbbrowne(at)acm(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Federated Postgresql architecture ?
Date: 2008-06-30 13:16:26
Message-ID: e51f66da0806300616q47a3433bt52f78716427c5665@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 6/27/08, Chris Browne <cbbrowne(at)acm(dot)org> wrote:
> josh(at)agliodbs(dot)com (Josh Berkus) writes:
>  > Jonah,
>  >
>  >> Hmm, I didn't think the Skype tools could really provide federated
>  >> database functionality without a good amount of custom work.  Or, am I
>  >> mistaken?
>  >
>  > Sure, what do you think pl/proxy is for?
>
>
> Ah, but the thing is, it changes the model from a relational one,
>  where you can have fairly arbitrary "where clauses," to one where
>  parameterization of queries must be predetermined.
>
>  The "hard part" of federated database functionality at this point is
>  the [parenthesized portion] of...
>
>   select * from table(at)node [where criterion = x];
>
>  What we'd like to be able to do is to ascertain that [where criterion
>  = x] portion, and run it on the remote DBMS, so that only the relevant
>  tuples would come back.
>
>  Consider...
>
>  What if table(at)node is a remote table with 200 million tuples, and
>  [where criterion = x] restricts the result set to 200 of those.
>
>  If you *cannot* push the "where clause" down to the remote node, then
>  you're stuck with pulling all 200 million tuples, and filtering out,
>  on the "local" node, the 200 tuples that need to be kept.
>
>  To do better, with pl/proxy, requires having a predetermined function
>  that would do that filtering, and if it's missing, you're stuck
>  pulling 200M tuples, and throwing out nearly all of them.
>
>  In contrast, with the work David Fetter's looking at, the [where
>  criterion = x] clause would get pushed to the node which the data is
>  being drawn from, and so the query, when running on "table(at)node,"
>  could use indices, and return only the 200 tuples that are of
>  interest.
>
>  It's a really big win, if it works.

I agree that for doing free-form queries on remote database,
the PL/Proxy is not the right answer.  (Although the recent patch
to support dynamic records with AS clause at least makes them work.)

But I want to clarify it's goal - it is not to run "pre-determined
queries."  It is to run "pre-determined complex transactions."

And to make those work in a "federated database" takes huge amount
of complexity that PL/Proxy simply sidesteps.  At the price of
requiring function-based API.  But as the function-based API has
other advantages even without PL/Proxy, it seems fine tradeoff.

-- 
marko

In response to

Responses

pgsql-performance by date

Next:From: Jonah H. HarrisDate: 2008-06-30 13:34:27
Subject: Re: Federated Postgresql architecture ?
Previous:From: Moritz OnkenDate: 2008-06-30 12:56:57
Subject: Re: Planner should use index on a LIKE 'foo%' query

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group