Re: The plan for FDW-based sharding

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: The plan for FDW-based sharding
Date: 2016-02-26 20:19:11
Message-ID: 56D0B33F.3030102@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/26/2016 09:30 PM, Alvaro Herrera wrote:
> Konstantin Knizhnik wrote:
>
>> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
>> But it cause big problems both for developers, which have to permanently
>> synchronize their branch with master,
>> and, what is more important, for customers, which can not use standard
>> version of PostgreSQL.
>> It may cause problems with system certification, with running Postgres in
>> cloud,...
>> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
>> wrong direction.
> That's not the point, though. I don't think a Postgres clone with a GTM
> solves any particular problem that's not already solved by the existing
> forks. However, if you have a clone at home and you make a GTM work on
> it, then you take the GTM as a patch and post it for discussion.
> There's no need for hooks for that. Just make sure your GTM solves the
> problem that it is supposed to solve.
>
> Excuse me if I've missed the discussion elsewhere -- why does
> PostgresPro have *two* GTMs instead of a single one?
>
There are many different clusters which require different approaches for managing distributed transactions.
Some clusters do no need distributed transactions at all: if you are executing OLAP queries on read-only database GTM will just add extra overhead.

pg_dtm uses centralized arbiter. It is similar with Postgres-XL DTM. Presence of single arbiter signficantly simplify all distributed algorithms: failure detection, global deadlock elimination, ... But at the same time arbiter is SPOF and main factor
limiting cluster scalability.

pg_tsdtm is based on another approach: it is using system time as CSN and doesn't require arbiter. In theory there is no limit for scalability. But differences in system time and necessity to use more rounds of communication have negative impact on
performance.

So there is no ideal solution which can work well for all cluster. This is why it is not possible to develop just one GTM, propose it as a patch for review and then (hopefully) commit it in Postgres core. IMHO it will never happen. And I do not think that
it is actually needed. What we need is a way to be able to create own transaction managers as Postgres extension not affecting its core.

All arguments against XTM can be applied to any other extension API in Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are not handled by this API. For example performing aggregation and grouping at foreign server side. But still it is very useful and flexible mechanism, allowing to implement many
wonderful things.

From my point of view good system should be as open and customizable as possible, if it doesn't affect performance.
Replacing direct function calls with indirect function calls in almost all cases can not suffer performance as well as adding hooks.
So without any extra price we get better flexibility. What's wrong with it?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Roma Sokolov 2016-02-26 20:51:03 Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator
Previous Message Alvaro Herrera 2016-02-26 20:12:12 pgsql: Add isolationtester spec for old heapam.c bug