Re: I'd like to discuss scaleout at PGCon

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: I'd like to discuss scaleout at PGCon
Date: 2018-06-06 16:37:56
Message-ID: 74F3D91943804B0B8F8CE5FB8A7AB966@tunaPC
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Simon Riggs
On 5 June 2018 at 17:14, MauMau <maumau307(at)gmail(dot)com> wrote:

>> Furthermore, an extra hop and double parsing/planning could matter
for
>> analytic queries, too. For example, SAP HANA boasts of scanning 1
>> billion rows in one second. In HANA's scaleout architecture, an
>> application can connect to any worker node and the node
communicates
>> with other nodes only when necessary (there's one special node
called
>> "master", but it manages the catalog and transactions; it's not an
>> extra hop like the coordinator in XL). Vertica is an MPP analytics
>> database, but it doesn't have a node like the coordinator, either.
To
>> achieve maximum performance for real-time queries, the scaleout
>> architecture should avoid an extra hop when possible.

> Agreed. When possible.
>
> When is it possible?
>
> Two ways I know of are:
>
> 1. have pre-knowledge in the client of where data is located
> (difficult to scale)
> 2. you must put data in all places the client can connect to (i.e.
multimaster)

Regarding 1, I understood you meant by "difficult to scale" that
whenever the user adds/removes a node from a cluster and data
placement changes, he has to change his application's connection
string to point to the node where necessary data resides.
Oracle Sharding provides a solution for that problem. See "6.1 Direct
Routing to a Shard" in the following manual page:

https://docs.oracle.com/en/database/oracle/oracle-database/18/shard/sh
arding-data-routing.html#GUID-64CAD794-FAAA-406B-9E20-0C35E96D3FA8

[Excerpt]
"Oracle clients and connections pools are able to recognize sharding
keys specified in the connection string for high performance data
dependent routing. A shard routing cache in the connection layer is
used to route database requests directly to the shard where the data
resides."

> Perhaps there are more?

Please see 6.1.1 below. The application can connect to multiple nodes
and distribute processing without an extra hop. This is also an
interesting idea, isn't it?

https://docs.voltdb.com/UsingVoltDB/ChapAppDesign.php#DesignAppConnect
Multi

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-06-06 16:40:40 Re: commitfest 2018-07
Previous Message Alvaro Herrera 2018-06-06 16:37:00 Re: Supporting tls-server-end-point as SCRAM channel binding for OpenSSL 1.0.0 and 1.0.1