Re: Horizontal scalability/sharding

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-02 20:11:07
Message-ID: 55E757DB.1030506@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/02/2015 08:27 PM, Robert Haas wrote:
> On Wed, Sep 2, 2015 at 1:59 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>
> wrote:
>>
>> This strikes me as a bit of a conflict of interest with FDW which
>> seems to want to hide the fact that it's foreign; the FDW
>> implementation makes it's own optimization decisions which might
>> make sense for single table queries but breaks down in the face of
>> joins.

+1 to these concerns

> Well, I don't think that ALL of the logic should go into the FDW.

Then maybe we shouldn't call this "FDW-based sharding" (or "FDW
approach" or whatever was used in this thread so far) because that kinda
implies that the proposal is to build on FDW.

In my mind, FDW is a wonderful tool to integrate PostgreSQL with
external data sources, and it's nicely shaped for this purpose, which
implies the abstractions and assumptions in the code.

The truth however is that many current uses of the FDW API are actually
using it for different purposes because there's no other way to do that,
not because FDWs are the "right way". And this includes the attempts to
build sharding on FDW, I think.

Situations like this result in "improvements" of the API that seem to
improve the API for the second group, but make the life harder for the
original FDW API audience by making the API needlessly complex. And I
say "seem to improve" because the second group eventually runs into the
fundamental abstractions and assumptions the API is based on anyway.

And based on the discussions at pgcon, I think this is the main reason
why people cringe when they hear "FDW" and "sharding" in the same sentence.

I'm not opposed to reusing the FDW infrastructure, of course.

> In particular, in this example, parallel aggregate needs the same
> query rewrite, so the logic for that should live in core so that
> both parallel and distributed queries get the benefit.

I'm not sure the parallel query is a great example here - maybe I'm
wrong but I think it's a fairly isolated piece of code, and we have
pretty clear idea of the two use cases.

I'm sure it's non-trivial to design it well for both cases, but I think
the questions for FWD/sharding will be much more about abstract concepts
than particular technical solutions.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2015-09-02 20:19:57 Re: Proposing COPY .. WITH PERMISSIVE
Previous Message dinesh kumar 2015-09-02 20:10:46 Re: Proposing COPY .. WITH PERMISSIVE