Re: Pluggable storage

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable storage
Date: 2016-08-17 17:03:44
Message-ID: 20160817170344.GA901677@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Anastasia Lubennikova wrote:
> 13.08.2016 02:15, Alvaro Herrera:

> >To support this, we introduce StorageTuple and StorageScanDesc.
> >StorageTuples represent a physical tuple coming from some storage AM.
> >It is necessary to have a pointer to a StorageAmRoutine in order to
> >manipulate the tuple. For heapam.c, a StorageTuple is just a HeapTuple.
>
> StorageTuples concept looks really cool. I've got some questions on
> details of implementation.
>
> Do StorageTuples have fields common to all implementations?
> Or StorageTuple is totally abstract structure that has nothing to do
> with data, except pointing to it?
>
> I mean, now we already have HeapTupleData structure, which is a pretty
> good candidate to replace with StorageTuple.

I was planning to replace all uses of HeapTuple in the executor with
StorageTuple, actually. But the main reason I would like to avoid
HeapTupleData itself is that it contains an assumption that there is a
single palloc chunk that contains the tuple (t_len and t_data). This
might not be true in representations that split the tuple, for example
in columnar storage where you have one column in page A and another
column in page B, for the same tuple. I suppose there might be some
point to keeping t_tableOid and t_self, though.

> And maybe add a "t_handler" field that points out to handler functions.
> I don't sure if it will be a name of StorageAm, or its OID, or maybe the
> main function itself. Although, If I'm not mistaken, we always have
> RelationData when we want to operate the tuple, so having t_handler
> in the StorageTuple is excessive.

Yeah, I think the RelationData (or more precisely the StorageAmRoutine)
is going to be available always, so I don't think we need a pointer in
the tuple itself.

> This approach allows to minimize code changes and ensure that we
> won't miss any function that handles tuples.
>
> Do you see any weak points of the suggestion?
> What design do you use in your prototype?

It's currently a "void *" pointer in my prototype.

> >RelationData gains ->rd_stamroutine which is a pointer to the
> >StorageAmRoutine for the relation in question. Similarly,
> >TupleTableSlot is augmented with a link to the StorageAmRoutine to
> >handle the StorageTuple it contains (probably in most cases it's set at
> >the same time as the tupdesc). This implies that routines such as
> >ExecAssignScanType need to pass down the StorageAmRoutine from the
> >relation to the slot.
>
> If we already have this pointer in t_handler as described below,
> we don't need to pass it between functions and slots.

I think it's better to have it in slots, so you can install multiple
tuples in the slot without having to change the routine pointers each
time.

> >The executor is modified so that instead of calling heap_insert etc
> >directly, it uses rel->rd_stamroutine to call these methods. The
> >executor is still in charge of dealing with indexes, constraints, and
> >any other thing that's not the tuple storage itself (this is one major
> >point in which this differs from FDWs). This all looks simple enough,
> >with one exception and a few notes:
>
> That is exactly what I tried to describe in my proposal.
> Chapter "Relation management". I'm sure, you've already noticed
> that it will require huge source code cleaning. I've carefully read
> the sources and found "violators" of abstraction in src/backend/commands.
> The list is attached to the wiki page
> https://wiki.postgresql.org/wiki/HeapamRefactoring.
>
> Except these, there are some pretty strange and unrelated functions in
> src/backend/catalog.
> I'm willing to fix them, but I'd like to synchronize our efforts.

I very much would like to stay away from touching src/backend/catalog,
which are the functions that deal with system catalogs. We can simply
say that system catalogs are hardcoded to use heapam.c storage for now.
If we later see a need to enable some particular catalog using a
different storage implementation, we can change the code for that
specific catalog in src/backend/catalog and everywhere else, to use the
abstract API instead of hardcoding heap_insert etc. But that can be
left for a second pass. (This is my point "iv" further below, to which
you said "+1").

> Nothing to do, just substitute t_data with proper HeapTupleHeader
> representation. I think it's a job for StorageAm. Let's say each StorageAm
> must have stam_to_heaptuple() function and opposite function
> stam_from_heaptuple().

Hmm, yeah, that also works. We'd have to check again whether it's more
convenient to start as a slot rather than a StorageTuple. AFAICS the
trigger.c code is all starting from a slot, so it makes sense to have
the conversion use the slot code -- that way, there's no need for each
storageAM to re-implement conversion to HeapTuple.

> >note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
> >format. In the long run, it may be possible to replace them with a
> >separate storage module that's specifically designed to handle tuples
> >meant for tuplestores etc. That may simplify TupleTableSlot and
> >execTuples. For the moment we keep the tts_mintuple as it is. Whenever
> >a tuple is not already in heap format, we heapify it in order to put in
> >the store.
> I wonder, do we really need MinimalTuples to support all formats?

Sure. I wouldn't want to say "you can create table in columnar storage
format, but if you do, these tables cannot use hash join".

> >ii) execTuples has additional accessors for tuples-in-slot, such as
> >ExecFetchSlotTuple and friends. I expect to have some of them to return
> >abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
> >wrapped in Datum), depending on callers. We might be able to cut down
> >on these later; my first cut will try to avoid API changes to keep
> >fallout to a minimum.
>
> I'd suggest replacing all occurrences of HeapTuple with StorageTuple.
> Do you see any problems with it?

The HeapTuple-in-datum representation, as I recall, is used in the SQL
function manager; maybe other places too. Maybe there's a way to fix
that layer so that it uses StorageTuple instead, but I prefer not to
touch it in the first phase. We can fix it later. This is already a
big enough patch ...

> >iii) All tuples need to be identifiable by ItemPointers. Storages that
> >have different requirements will need careful additional thought across
> >the board.
>
> For a start, we can simply deny secondary indexes for these storages
> or require a function that converts tuple identifier inside the storage to
> ItemPointer suitable for an index.

Umm. I don't think rejecting secondary indexes would work very well. I
think we can lift this limitation later; we just need to change the
IndexTuple abstraction so that it doesn't rely on ItemPointer as
currently.

> >v) Currently, one Buffer may be associated with one HeapTuple living in a
> >slot; when the slot is cleared, the buffer pin is released. My current
> >patch moves the buffer pin to inside the heapam-based storage AM and the
> >buffer is released by the ->slot_clear_tuple method. The rationale for
> >doing this is that some storage AMs might want to keep several buffers
> >pinned at once, for example, and must not to release those pins
> >individually but in batches as the scan moves forwards (say a batch of
> >tuples in a columnar storage AM has column values spread across many
> >buffers; they must all be kept pinned until the scan has moved past the
> >whole set of tuples). But I'm not really sure that this is a great
> >design.
>
> Frankly, I doubt that it's real to implement columnar storage just as
> a variant of pluggable storage. It requires a lot of changes in executor
> and optimizer and so on, which are hardly compatible with existing
> tuple-oriented model. However I'm not so good in this area, so if you
> feel that it's possible, go ahead.

Well, not *just* as a variant of pluggable storage. This thread is just
one sub-project inside the greater project to enable column-oriented
storage; that includes further changes to executor, too, but I haven't
discussed those in this proposal. I mentioned all this in Brussels'
developer meeting earlier this year. (There I mostly talked about
vertical partitioning, which is a different subproject that I've put
aside for the moment, but really it's all part of the same thing.)
https://wiki.postgresql.org/wiki/Future_of_storage

Thanks for reading!

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Flower 2016-08-17 18:14:40 Re: [GENERAL] C++ port of Postgres
Previous Message Alvaro Herrera 2016-08-17 16:35:21 Re: PATCH: Exclude additional directories in pg_basebackup