Re: I'd like to discuss scaleout at PGCon

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Michael Paquier" <michael(at)paquier(dot)xyz>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: I'd like to discuss scaleout at PGCon
Date: 2018-06-06 15:53:20
Message-ID: BDE951DEFC144489AFD1D7131C547B8B@tunaPC
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Michael Paquier
> Greenplum's orca planner (and Citus?) have such facilities if I
recall
> correctly, just mentioning that pushing down directly to remote
nodes
> compiled plans ready for execution exists here and there (that's not
the
> case of XC/XL). For queries whose planning time is way shorter than
its
> actual execution, like analytical work that would not matter much.
But
> not for OLTP and short transaction workloads.

It seems that Greenplum does:

https://greenplum.org/docs/580/admin_guide/query/topics/parallel-proc.
html#topic1

"The master receives, parses, and optimizes the query. The resulting
query plan is either parallel or targeted. The master dispatches
parallel query plans to all segments,..."

while Citus doesn't:

https://docs.citusdata.com/en/v7.4/develop/reference_processing.html#c
itus-query-processing

"Next, the planner breaks the query into two parts - the coordinator
query which runs on the coordinator and the worker query fragments
which run on individual shards on the workers. The planner then
assigns these query fragments to the workers such that all their
resources are used efficiently. After this step, the distributed query
plan is passed on to the distributed executor for execution.
...
Once the distributed executor sends the query fragments to the
workers, they are processed like regular PostgreSQL queries. The
PostgreSQL planner on that worker chooses the most optimal plan for
executing that query locally on the corresponding shard table. The
PostgreSQL executor then runs that query and returns the query results
back to the distributed executor."

BTW, the above page states that worker nodes directly exchanges data
during query execution. Greenplum also does so among segment nodes to
join tables which are distributed by different key columns. XL seems
to do so, too. If this type of interaction is necessary, how would
the FDW approach do that? The remote servers need to interact with
each other.

"The task tracker executor is designed to efficiently handle complex
queries which require repartitioning and shuffling intermediate data
among workers."

> Greenplum uses also a single-coordinator, multi-datanode instance.
That
> looks similar, right?

Greenplum uses a single master and multiple workers. That's similar
to Citus. But Greenplum is not similar to VoltDB nor Vertica, since
those allow applications to connect to any node.

>> Our proprietary RDBMS named Symfoware, which is not based on
>> PostgreSQL, also doesn't have an extra hop, and can handle
distributed
>> transactions and deadlock detection/resolution without any special
>> node like GTM.
>
> Interesting to know that. This is an area with difficult problems.
At
> the closer to merge with Postgres head, the more fun (?) you get
into
> trying to support new SQL features, and sometimes you finish with
hard
> ERRORs or extra GUC switches to prevent any kind of inconsistent
> operations.

Yes, I hope our deadlock detection/resolution can be ported to
PostgreSQL. But I'm also concerned like you, because Symfoware is
locking-based, not MVCC-based.

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2018-06-06 15:57:31 Re: commitfest 2018-07
Previous Message Alvaro Herrera 2018-06-06 15:44:15 Re: Scariest patch tournament, PostgreSQL 11 edition