Re: The plan for FDW-based sharding

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: The plan for FDW-based sharding
Date: 2016-03-01 18:56:58
Message-ID: 56D5E5FA.3070809@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27/02/16 04:54, Robert Haas wrote:
> On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> We do not have formal prove that proposed XTM is "general enough" to handle
>> all possible transaction manager implementations.
>> But there are two general ways of dealing with isolation: snapshot based and
>> CSN based.
>
> I don't believe that for a minute. For example, consider this article:
>
> https://en.wikipedia.org/wiki/Global_serializability
>
> I think the neutrality of that article is *very* debatable, but it
> certainly contradicts the idea that snapshots and CSNs are the only
> methods of achieving global serializability.
>
> Or consider this lecture:
>
> http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf
>
> That's a great introduction to the problem we're trying to solve here,
> but again, snapshots are not mentioned, and CSNs certainly aren't
> mentioned.
>
> This write-up goes further, explaining three different methods for
> ensuring global serializability, none of which mention snapshots or
> CSNs:
>
> http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html
>
> Actually, I think the second approach is basically a snapshot/CSN-type
> approach, but it doesn't use that terminology and the connection to
> what you are proposing is very unclear.
>
> I think you're approaching this problem from a viewpoint that is
> entirely too focused on the code that exists in PostgreSQL today.
> Lots of people have done lots of academic research on how to solve
> this problem, and you can't possibly say that CSNs and snapshots are
> the only solution to this problem unless you haven't read any of those
> papers. The articles above aren't exceptional in mentioning neither
> of the approaches that you are advocating - they are typical of the
> literature in this area. How can it be that the only solutions to
> this problem are ones that are totally different from the approaches
> that university professors who spend time doing research on
> concurrency have spent time exploring?
>
> I think we need to back up here and examine our underlying design
> assumptions. The goal here shouldn't necessarily be to replace
> PostgreSQL's current transaction management with a distributed version
> of the same thing. We might want to do that, but I think the goal is
> or should be to provide ACID semantics in a multi-node environment,
> and specifically the I in ACID: transaction isolation. Making the
> existing transaction manager into something that can be spread across
> multiple nodes is one way of accomplishing that. Maybe the best one.
> Certainly one that's been experimented within Postgres-XC. But it is
> often the case that an algorithm that works tolerably well on a single
> machine starts performing extremely badly in a distributed
> environment, because the latency of communicating between multiple
> systems is vastly higher than the latency of communicating between
> CPUs or cores on the same system. So I don't think we should be
> assuming that's the way forward.
>

I have similar problem with the FDW approach though. It seems to me like
because we have something that solves access to external tables somebody
decided that it should be used as base for the whole sharding solution
but there is no real concept of how it will look like together, no ideas
what it will be usable for and not even simple prototype that would
prove that the idea is sound (although again, I am not clear on what the
actual idea is beyond "we will use FDWs").

Don't get me wrong, I agree that the current FDW enhancements are
useful, I am just worried about them being presented as future of
sharding in Postgres when nobody has sketched how the future might look
like. And once we get to more interesting parts like consistency,
distributed query planning, p2p connections (and I am really concerned
about these as FDWs abstract some knowledge that coordinator and or data
nodes might need to do these well), etc we might very well find
ourselves painted in the corner and have to start from beginning, while
if we had some idea on how the whole thing might look like we could
identify this early and not postpone built-in sharding by several years
just because somebody said we will use FDWs and that's what we worked on
in those years.

Note that I am not saying that other discussed approaches are any
better, I am saying that we should know approximately what we actually
want and not just beat FDWs with a hammer and hope sharding will
eventually emerge and call that the plan.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-01 19:00:36 Re: Reduce lock levels others reloptions in ALTER TABLE
Previous Message Robert Haas 2016-03-01 18:54:46 Re: psql completion for ids in multibyte string