Re: Join push-down support for foreign tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join push-down support for foreign tables
Date: 2014-09-04 20:50:34
Message-ID: CA+TgmobrSUB6BNjJ+J-BSUtS-89NxxHZxM2T=v4zof9BVaN2vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 4, 2014 at 11:56 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> I am thinking eventually we will need to cache the foreign server
>> statistics on the local server.
>>
>> Wouldn't that lead to issues where the statistics get outdated and we have to
>> anyways query the foreign server before planning any joins? Or are you thinking
>> of dropping the foreign table statistics once the foreign join is complete?
>
> I am thinking we would eventually have to cache the statistics, then get
> some kind of invalidation message from the foreign server. I am also
> thinking that cache would have to be global across all backends, I guess
> similar to our invalidation cache.

Maybe ... but I think this isn't really related to the ostensible
topic of this thread. We can do join pushdown just fine without the
ability to do anything like this.

I'm in full agreement that we should probably have a way to cache some
kind of statistics locally, but making that work figures to be tricky,
because (as I'm pretty sure Tom has pointed out before) there's no
guarantee that the remote side's statistics look anything like
PostgreSQL statistics, and so we might not be able to easily store
them or make sense of them. But it would be nice to at least have the
option to store such statistics if they do happen to be something we
can store and interpret.

It's also coming to seem to me more and more that we need a way to
designate several PostgreSQL machines as a cooperating cluster. This
would mean they'd keep connections to each other open and notify each
other about significant events, which could include "hey, I updated
the statistics on this table, you might want to get the new ones" or
"hey, i've replicated your definition for function X so it's safe to
push it down now" as well as "hey, I have just been promoted to be the
new master" or even automatic negotiation of which of a group of
machines should become the master after a server failure. So far,
we've taken the approach that postgres_fdw is just another FDW which
enjoys no special privileges, and I think that's a good approach on
the whole, but think if we want to create a relatively seamless
multi-node experience as some of the NoSQL databases do, we're going
to need something more than that.

But all of that is a bit pie in the sky, and the join pushdown
improvements we're talking about here don't necessitate any of it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-09-04 20:53:23 Re: Commitfest status
Previous Message Pavel Stehule 2014-09-04 20:23:48 Re: vacuumdb --all --analyze-in-stages - wrong order?