Re: I'd like to discuss scaleout at PGCon

From: Sumanta Mukherjee <sumanta(dot)mukherjee(at)enterprisedb(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "maumau307(at)gmail(dot)com" <maumau307(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: I'd like to discuss scaleout at PGCon
Date: 2020-06-22 04:53:33
Message-ID: CAMSJAirGtacYGkdUV=0nEYt11LAtb8_v99cOVR3mBKB8LB2N0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I read through the symfora paper and it is a nice technique. I am not very
sure about where Hyder is used commercially but given that it has come out
of Microsoft Research so some microsoft products might be using it/some of
these concepts already.

With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com

On Wed, Jun 17, 2020 at 9:38 PM tsunakawa(dot)takay(at)fujitsu(dot)com <
tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:

> Hello,
>
>
>
> It seems you didn't include pgsql-hackers.
>
>
>
>
>
> From: Sumanta Mukherjee <sumanta(dot)mukherjee(at)enterprisedb(dot)com>
>
> > I saw the presentation and it is great except that it seems to be
> unclear of both SD and SN if the storage and the compute are being
> explicitly separated. Separation of storage and compute would have some
> cost advantages as per my understanding. The following two work (ref below)
> has some information about the usefulness of this technique for scale out
> and so it would be an interesting addition to see if in the SN
> architecture that is being proposed could be modified to take care of this
> phenomenon and reap the gain.
>
>
>
> Thanks. Separation of compute and storage is surely to be considered.
> Unlike the old days when the shared storage was considered to be a
> bottleneck with slow HDDs and FC-SAN, we could now expect high speed shared
> storage thanks to flash memory, NVMe-oF, and RDMA.
>
>
>
> > 1. Philip A. Bernstein, Colin W. Reid, and Sudipto Das. 2011. Hyder - A
>
> > Transactional Record Manager for Shared Flash. In CIDR 2011.
>
>
>
> This is interesting. I'll go into this. Do you know there's any product
> based on Hyder? OTOH, Hyder seems to require drastic changes when adopting
> for Postgres -- OCC, log-structured database, etc. I'd like to hear how
> feasible those are. However, its scale-out capability without the need for
> data or application partitioning appears appealing.
>
>
>
>
>
> To explore another possibility that would have more affinity with the
> current Postgres, let me introduce our proprietary product called
> Symfoware. It's not based on Postgres.
>
>
>
> It has shared nothing scale-out functionality with full ACID based on 2PC,
> conventional 2PL locking and distributed deadlock resolution. Despite
> being shared nothing, all the database files and transaction logs are
> stored on shared storage.
>
>
>
> The database is divided into "log groups." Each log group has one
> transaction log and multiple tablespaces (it's called "database space"
> instead of tablespace.)
>
>
>
> Each DB instance in the cluster owns multiple log groups, and handles
> reads/writes to the data in its owning log groups. When a DB instance
> fails, other surviving DB instances take over the log groups of the failed
> DB instance, recover the data using the transaction log of the log group,
> and accepts reads/writes to the data in the log group. The DBA configures
> which DB instance initially owns which log groups and which DB instances
> are candidates to take over which log groups.
>
>
>
> This way, no server is idle as a standby. All DB instances work hard to
> process read-write transactions. This "no idle server for HA" is one of
> the things Oracle RAC users want in terms of cost.
>
>
>
> However, it still requires data and application partitioning unlike
> Hyder. Does anyone think of a way to eliminate partitioning? Data and
> application partitioning is what Oracle RAC users want to avoid or cannot
> tolerate.
>
>
>
> Ref: Introduction of the Symfoware shared nothing scale-out called "load
> share."
>
>
> https://pdfs.semanticscholar.org/8b60/163593931cebc58e9f637cfb501500230adc.pdf
>
>
>
>
>
> Regards
>
> Takayuki Tsunakawa
>
>
>
>
>
> --- below is Sumanta's original mail ---
>
> *From:* Sumanta Mukherjee <sumanta(dot)mukherjee(at)enterprisedb(dot)com>
> *Sent:* Wednesday, June 17, 2020 5:34 PM
> *To:* Tsunakawa, Takayuki/綱川 貴之 <tsunakawa(dot)takay(at)fujitsu(dot)com>
> *Cc:* Bruce Momjian <bruce(at)momjian(dot)us>; Merlin Moncure <mmoncure(at)gmail(dot)com>;
> Robert Haas <robertmhaas(at)gmail(dot)com>; maumau307(at)gmail(dot)com
> *Subject:* Re: I'd like to discuss scaleout at PGCon
>
>
>
> Hello,
>
>
>
> I saw the presentation and it is great except that it seems to be unclear
> of both SD and SN if the storage and the compute are being explicitly
> separated. Separation of storage and compute would have some cost
> advantages as per my understanding. The following two work (ref below) has
> some information about the usefulness of this technique for scale out and
> so it would be an interesting addition to see if in the SN architecture
> that is being proposed could be modified to take care of this phenomenon
> and reap the gain.
>
>
>
> 1. Philip A. Bernstein, Colin W. Reid, and Sudipto Das. 2011. Hyder - A
> Transactional Record Manager for Shared Flash. In CIDR 2011.
>
>
>
> 2. Dhruba Borthakur. 2017. The Birth of RocksDB-Cloud. http://rocksdb.
> blogspot.com/2017/05/the-birth-of-rocksdb-cloud.html.
>
>
>
> With Regards,
>
> Sumanta Mukherjee.
>
> EnterpriseDB: http://www.enterprisedb.com
>
>
>
>
>
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-06-22 04:54:22 Re: Parallel Seq Scan vs kernel read ahead
Previous Message Michael Paquier 2020-06-22 04:48:11 Re: tag typos in "catalog.sgml"