Re: I'd like to discuss scaleout at PGCon

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: I'd like to discuss scaleout at PGCon
Date: 2018-06-01 15:02:35
Message-ID: CANP8+jJ_e6xxhvx1i0jWZJmmMBbx2SzW5nG-eC2W2A77ywZkwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 June 2018 at 15:44, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Thu, May 31, 2018 at 11:00 PM, MauMau <maumau307(at)gmail(dot)com> wrote:
>> 2018-05-31 22:44 GMT+09:00, Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> On Thu, May 31, 2018 at 8:12 AM, MauMau <maumau307(at)gmail(dot)com> wrote:
>>>> Oh, I didn't know you support FDW approach mainly for analytics. I
>>>> guessed the first target was OLTP read-write scalability.
>>>
>>> That seems like a harder target to me, because you will have an extra
>>> hop involved -- SQL from the client to the first server, then via SQL
>>> to a second server. The work of parsing and planning also has to be
>>> done twice, once for the foreign table and again for the table. For
>>> longer-running queries this overhead doesn't matter as much, but for
>>> short-running queries it is significant.
>>
>> Yes, that extra hop and double parsing/planning were the killer for
>> our performance goal when we tried to meet our customer's scaleout
>> needs with XL. The application executes 82 DML statements in one
>> transaction. Those DMLs consist of INSERT, UPDATE and SELECT that
>> only accesses one row with a primary key. The target tables are only
>> a few, so the application PREPAREs a few statements and EXECUTEs them
>> repeatedly. We placed the coordinator node of XL on the same host as
>> the application, and data nodes and GTM on other individual nodes.
>>
>
> I agree that there's double parsing happening, but I am hesitant to
> agree with the double planning claim. We do plan, let's say a join
> between two foreign tables, on the local server, but that's only to
> decide whether it's efficient to join locally or on the foreign
> server. That means we create foreign paths for scan on the foreign
> tables, may be as many parameterized plans as the number of join
> conditions, and one path for the join pushdown that's it. We then
> create local join paths but we need those to decide whether it's
> efficient to join locally and if yes, which way. But don't create
> paths as to how the foreign server would plan that join. That's not
> double planning since we do not create same paths locally and on the
> foreign server.
>
> In order to avoid double parsing, we might want to find a way to pass
> a "normalized" parse tree down to the foreign server. We need to
> normalize the OIDs in the parse tree since those may be different
> across the nodes.

Passing detailed info between servers is exactly what XL does.

It requires us to define a cluster, exactly as XL does.

And yes, its a good idea to replicate some tables to all nodes, as XL does.

So it seems we have at last some agreement that some of the things XL
does are the correct approaches.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-06-01 15:10:38 Re: I'd like to discuss scaleout at PGCon
Previous Message Laurenz Albe 2018-06-01 14:48:29 Re: Loaded footgun open_datasync on Windows