Re: I'd like to discuss scaleout at PGCon

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: I'd like to discuss scaleout at PGCon
Date: 2018-06-22 17:34:17
Message-ID: 20180622173417.GC27035@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 1, 2018 at 11:29:43AM -0500, Merlin Moncure wrote:
> FWIW, Distributed analytical queries is the right market to be in.
> This is the field in which I work, and this is where the action is at.
> I am very, very, sure about this. My view is that many of the
> existing solutions to this problem (in particular hadoop class
> soltuions) have major architectural downsides that make them
> inappropriate in use cases that postgres really shines at; direct
> hookups to low latency applications for example. postgres is
> fundamentally a more capable 'node' with its multiple man-millennia of
> engineering behind it. Unlimited vertical scaling (RAC etc) is
> interesting too, but this is not the way the market is moving as
> hardware advancements have reduced or eliminated the need for that in
> many spheres.
>
> The direction of the project is sound and we are on the cusp of the
> point where multiple independent coalescing features (FDW, logical
> replication, parallel query, executor enhancements) will open new
> scaling avenues that will not require trading off the many other
> benefits of SQL that competing contemporary solutions might. The
> broader development market is starting to realize this and that is a
> major driver of the recent upswing in popularity. This is benefiting
> me tremendously personally due to having gone 'all-in' with postgres
> almost 20 years ago :-D. (Time sure flies) These are truly
> wonderful times for the community.

I am coming in late, but I am glad we are having this conversation. We
have made great strides toward sharding while adding minimal
sharding-specific code. We can now see a time when we will complete the
the minimal sharding-specific code tasks. Once we reach that point, we
will need to decide what sharding-specific code to add, and to do that,
we need to understand which direction to go in, and to do that, we need
to know the trade-offs.

While I am glad people know a lot about how other projects handle
sharding, these can be only guides to how Postgres will handle such
workloads. I think we need to get to a point where we have all of the
minimal sharding-specific code features done, at least as
proof-of-concept, and then test Postgres with various workloads like
OLTP/OLAP and read-write/read-only. This will tell us where
sharding-specific code will have the greatest impact.

What we don't want to do is to add a bunch of sharding-specific code
without knowing which workloads it benefits, and how many of our users
will actually use sharding. Some projects have it done that, and it
didn't end well since they then had a lot of product complexity with
little user value.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robbie Harwood 2018-06-22 17:56:37 Re: libpq compression
Previous Message Amit Khandekar 2018-06-22 16:55:04 Re: Concurrency bug in UPDATE of partition-key