Re: The plan for FDW-based sharding

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The plan for FDW-based sharding
Date: 2016-02-24 09:35:15
Message-ID: CAF4Au4zx5zC1Zt12mM8KWiECAiDm=mv4P+RhM4jqofeZFLdm3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> Hi, Bruce!
>
> The important point for me is to distinguish different kind of plans:
> implementation plan and research plan.
> If we're talking about implementation plan then it should be proven that
> proposed approach works in this case. I.e research should be already done.
> If we're talking about research plan then we should realize that result is
> unpredictable. And we would probably need to dramatically change our way.
>
> This two things would work with FDW:
> 1) Pull data from data nodes to coordinator.
> 2) Pushdown computations from coordinator to data nodes: joins, aggregates
> etc.
> It's proven and clear. This is good.
> Another point is that these FDW advances are useful by themselves. This is
> good too.
>
> However, the model of FDW assumes that communication happen only between
> coordinator and data node. But full-weight distributed optimized can't be
> done under this restriction, because it requires every node to communicate
> every other node if it makes distributed query faster. And as I get, FDW
> approach currently have no research and no particular plan for that.
>
> As I get from Robert Haas's talk (
> https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
> )
>
>> Before we consider repartitioning joins, we should probably get
>> everything previously discussed working first.
>> – Join Pushdown For Parallelism, FDWs
>> – PartialAggregate/FinalizeAggregate
>> – Aggregate Pushdown For Parallelism, FDWs
>> – Declarative Partitioning
>> – Parallel-Aware Append
>
>
> So, as I get we didn't ever think about possibility of data redistribution
> using FDW. Probably, something changed since that time. But I haven't heard
> about it.
>
> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>> Second, as part of this staged implementation, there are several use
>> cases that will be shardable at first, and then only later, more complex
>> ones. For example, here are some use cases and the technology they
>> require:
>>
>> 1. Cross-node read-only queries on read-only shards using aggregate
>> queries, e.g. data warehouse:
>>
>> This is the simplest to implement as it doesn't require a global
>> transaction manager, global snapshot manager, and the number of rows
>> returned from the shards is minimal because of the aggregates.
>>
>> 2. Cross-node read-only queries on read-only shards using non-aggregate
>> queries:
>>
>> This will stress the coordinator to collect and process many returned
>> rows, and will show how well the FDW transfer mechanism scales.
>>
>
> FDW would work for queries which fits pull-pushdown model. I see no plan
> to make other queries work.
>
>
>> 3. Cross-node read-only queries on read/write shards:
>>
>> This will require a global snapshot manager to make sure the shards
>> return consistent data.
>>
>> 4. Cross-node read-write queries:
>>
>> This will require a global snapshot manager and global snapshot manager.
>>
>
> At this point, it unclear why don't you refer work done in the direction
> of distributed transaction manager (which is also distributed snapshot
> manager in your terminology)
> http://www.postgresql.org/message-id/56BB7880.4020604@postgrespro.ru
>
>
>> In 9.6, we will have FDW join and sort pushdown
>> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
>> calability.html
>> <http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-scalability.html>).
>> Unfortunately I don't think we will have aggregate
>> pushdown, so we can't test #1, but we might be able to test #2, even in
>> 9.5. Also, we might have better partitioning syntax in 9.6.
>>
>> We need things like parallel partition access and replicated lookup
>> tables for more join pushdown.
>>
>> In a way, because these enhancements are useful independent of sharding,
>> we have not tested to see how well an FDW sharding setup will work and
>> for which workloads.
>>
>
> This is the point I agree. I'm not objecting against any single FDW
> advance, because it's useful by itself.
>
> We know Postgres XC/XL works, and scales, but we also know they require
>> too many code changes to be merged into Postgres (at least based on
>> previous discussions). The FDW sharding approach is to enhance the
>> existing features of Postgres to allow as much sharding as possible.
>>
>
> This comparison doesn't seems correct to me. Postgres XC/XL supports data
> redistribution between nodes. And I haven't heard any single idea of
> supporting this in FDW. You are comparing not equal things.
>
>
>> Once that is done, we can see what workloads it covers and
>> decide if we are willing to copy the volume of code necessary
>> to implement all supported Postgres XC or XL workloads.
>> (The Postgres XL license now matches the Postgres license,
>> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
>> Postgres XC has always used the Postgres license.)
>>
>> If we are not willing to add code for the missing Postgres XC/XL
>> features, Postgres XC/XL will probably remain a separate fork of
>> Postgres. I don't think anyone knows the answer to this question, and I
>> don't know how to find the answer except to keep going with our current
>> FDW sharding approach.
>>
>
> I have nothing against particular FDW advances. However, it's unclear for
> me that FDW should be the only sharding approach.
> It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
> can have some low-hanging fruits. That's good.
> But it's unclear we can have high-hanging fruits (like data
> redistribution) with FDW approach. And if we can it's unclear that it would
> be easier than with other approaches.
> Just let's don't call this community chosen plan for implementing sharding.
> Until we have full picture we can't select one way and reject others.
>

I already several times pointed, that we need XTM to be able to continue
development in different directions, since there is no clear winner.
Moreover, I think there is no fits-all solution and while I agree we need
one built-in in the core, other approaches should have ability to exists
without patching.

>
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2016-02-24 09:48:43 Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types
Previous Message Konstantin Knizhnik 2016-02-24 09:22:20 Re: The plan for FDW-based sharding