Re: [HACKERS] Pluggable storage

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Pluggable storage
Date: 2017-12-27 12:33:22
Message-ID: CAPpHfdtF3=aN2Knis+ZnbDYrZYJROesxLhbc5U3c0HCnYp4g1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Wed, Dec 27, 2017 at 6:54 AM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

>
> On Tue, Dec 12, 2017 at 3:06 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
>
>>
>> I restructured that patch files to avoid showing unnecessary
>> modifications,
>> and also it will be easy for adding of new API's based on the all the
>> functions
>> that are exposed by heapam module easily compared earlier.
>>
>> Attached are the latest set of patches. I will work on the remaining
>> pending
>> items.
>>
>
> Apart from rebase to the latest master code, following are the additional
> changes,
>
> 1. Added API for bulk insert and rewrite functionality(Logical rewrite is
> not touched yet)
> 2. Tuple lock API interface redesign to remove the traversal logic from
> executor module.
>

Great, thank you!

> The tuple lock API interface changes are from "Alexander Korotkov" from
> "PostgresPro".
> Thanks Alexander. Currently we both are doing joint development for faster
> closure of
> open items that are pending to bring the "pluggable storage API" into a
> good shape.
>

Thank you for announcing this. Yes, pluggable storage API requires a lot
of work to get into committable shape. This is why I've decided to join
the development.

Let me explain the idea behind new tuple lock API and further patches I
plan to send. As I noted upthread, I consider possibility of alternative
MVCC implementations as vital property of pluggable storage API. These
include undo log option when tuple is updated in-place while old version of
tuple is displaced to some special area. In this case, new version of
tuple would reside on same TID as old version of tuple. This is an
important point, because TID is not really tuple identifier anymore.
Naturally, TID becomes a row identifier while tuple may be identified by
pair (tid, snapshot). For current heap, snapshot is redundant and can be
used just for assert checking (tuple on given tid is really visible using
given snapshot). For heap with undo log, appropriate tuple could be found
by snapshot in the undo chain associated with given tid.

One of consequences of above is that we cannot use fact that tid isn't
changed after update as sign that tuple was deleted. This is why I've
introduced HTSU_Result HeapTupleDeleted. Another consequence is that our
tid traverse logic in the executor layer is not valid anymore. For
instance, this traversal from older tuple to latter tuple doesn't make any
sense for heap with undo log where latter tuple is more easily accessible
than older tuple. This is why I decided to hide this logic in storage
layer and provide TUPLE_LOCK_FLAG_FIND_LAST_VERSION flag which indicates
that lock_tuple() have to find latest updated version and lock it. I've
also changed follow_updates bool to more explicit
TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS flag in order to not mess it with
previous flag which also kind of follow updates. Third consequence is that
we have to pass snapshot to tuple_update() and tuple_delete() methods to
let them check if row was concurrently updated while residing on the same
TID. I'm going to provide this change as separate patch.

Also, I appreciate that now tuple_insert() and tuple_update() methods are
responsible for inserting index tuples. This unleash pluggable storages to
implement another way of interaction with indexes. However, I didn't get
the point of passing InsertIndexTuples IndexFunc to them. Now, we're
always passing ExecInsertIndexTuples() to this argument. As I understood
storage is free to either call ExecInsertIndexTuples() or implement its own
logic of interaction with indexes. But, I don't understand why do we need
a callback when tuple_insert() and tuple_update() can call
ExecInsertIndexTuples() directly if needed. Another thing is that
tuple_delete() could also interact with indexes (especially when we will
enhance index access method API), and we need to pass meta-information
about indexes to tuple_delete() too.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-12-27 12:35:54 Comment typo in postgres_fdw.c
Previous Message Etsuro Fujita 2017-12-27 11:55:49 postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly