Re: I'd like to discuss scaleout at PGCon

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: I'd like to discuss scaleout at PGCon
Date: 2018-06-07 05:09:09
Message-ID: CAFjFpRepQesCjbQqQBfZKYb88a3M7Uy6VB+NmoUP7cb7QCqu4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 6, 2018 at 11:46 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> On 2018-Jun-06, Ashutosh Bapat wrote:
>
>> On Tue, Jun 5, 2018 at 10:04 PM, MauMau <maumau307(at)gmail(dot)com> wrote:
>> > From: Ashutosh Bapat
>> >> In order to normalize parse trees, we need to at least replace
>> >> various OIDs in parse-tree with something that the foreign server
>> >> will understand correctly like table name on the foreign table
>> >> pointed to by local foreign table OR (schema qualified) function
>> >> names and so on.
>> >
>> > Yes, that's the drawback of each node in the cluster having
>> > different OIDs for the same object. That applies to XL, too.
>>
>> Keeping OIDs same across the nodes would require extra communication
>> between nodes to keep track of next OID, dropped OIDs etc. We need to
>> weigh the time spent in that communication and the time saved during
>> parsing.
>
> We already have the ability to give objects some predetermined OID, for
> pg_upgrade.

True. But that's only for a database not in action. We are talking
about database in action. Assigning a predetermined OID is just one of
and possibly the smallest thing in the bigger picture.

>
> Maybe an easy (hah) thing to do is use 2PC for DDL, agree on a OID
> that's free on every node, then create the object in all servers at the
> same time. We currently use the system-wide OID generator to assign the
> OID, but seems an easy thing to change (much harder is to prevent
> concurrent creation of objects using the arranged OID; maybe can reuse
> speculative tokens in btrees for this). Doing this imposes a cost at
> DDL-execution-time only, which seems much better than imposing the cost
> of translating name to OID on every server for every query.

This works if we consider that all the nodes are up always. If a few
nodes are down, the rest of the nodes need to determine the OID and
communicate it to the failed nodes when they come up. That's easier
said than done. The moment we design something like that, we have to
deal with split brain problem. Two sets of nodes which think the other
set is down, will keep assigning OIDs that they think are OK and later
see the conflicts when communicating the assigned OIDs.

Not that we can not implement something like this, but it is a lot of
work. We will need to be careful to identify the cases where the
scheme will fail and plug all the holes.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-06-07 05:17:19 Re: Remove mention in docs that foreign keys on partitioned tables are not supported
Previous Message David Rowley 2018-06-07 05:01:47 Re: computing completion tag is expensive for pgbench -S -M prepared