Re: Pluggable storage

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable storage
Date: 2017-10-11 20:54:50
Message-ID: CAH2-Wzm7vFV_fv0hT6C0h8Yn8_tj8c6CG0xw2tBC0PBT6v3-DQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 11, 2017 at 1:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Oct 9, 2017 at 10:22 AM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> For me, it's crucial point that pluggable storages should be able to have
>> different MVCC implementation, and correspondingly have full control over
>> its interactions with indexes.
>> Thus, it would be good if we would get consensus on that point. I'd like
>> other discussion participants to comment whether they agree/disagree and
>> why.
>> Any comments?
>
> I think it's good for new storage managers to have full control over
> interactions with indexes. I'm not sure about the MVCC part. I think
> it would be legitimate to want a storage manager to ignore MVCC
> altogether - e.g. to build a non-transactional table.

I agree with Alexander -- if you're going to have a new MVCC
implementation, you have to do significant work within index access
methods. Adding "retail index tuple deletion" is probably just the
beginning. ISTM that you need something like InnoDB's purge thread
when index values change, since two versions of the same index tuple
(each with distinct attribute values) have to physically co-exist for
a time.

> I don't know
> that it would be a very good idea to have two different full-fledged
> MVCC implementations, though. Like Tom says, that would be
> replicating a lot of the awfulness of the MySQL model.

It's not just the MySQL model, FWIW. SQL-on-Hadoop systems like
Impala, certain NoSQL systems, and AFAIK any database system that
claims to have pluggable storage all do it this way. That is, core
transaction management functions (e.g. MVCC snapshot acquisition) is
outsourced to the storage engine. It *is* very cumbersome, but that's
what they do.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-10-11 20:58:21 Re: Omission in GRANT documentation
Previous Message Jeremy Schneider 2017-10-11 20:28:26 Re: show precise repos version for dev builds?