From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Pluggable storage |
Date: | 2016-08-17 17:03:44 |
Message-ID: | 20160817170344.GA901677@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Anastasia Lubennikova wrote:
> 13.08.2016 02:15, Alvaro Herrera:
> >To support this, we introduce StorageTuple and StorageScanDesc.
> >StorageTuples represent a physical tuple coming from some storage AM.
> >It is necessary to have a pointer to a StorageAmRoutine in order to
> >manipulate the tuple. For heapam.c, a StorageTuple is just a HeapTuple.
>
> StorageTuples concept looks really cool. I've got some questions on
> details of implementation.
>
> Do StorageTuples have fields common to all implementations?
> Or StorageTuple is totally abstract structure that has nothing to do
> with data, except pointing to it?
>
> I mean, now we already have HeapTupleData structure, which is a pretty
> good candidate to replace with StorageTuple.
I was planning to replace all uses of HeapTuple in the executor with
StorageTuple, actually. But the main reason I would like to avoid
HeapTupleData itself is that it contains an assumption that there is a
single palloc chunk that contains the tuple (t_len and t_data). This
might not be true in representations that split the tuple, for example
in columnar storage where you have one column in page A and another
column in page B, for the same tuple. I suppose there might be some
point to keeping t_tableOid and t_self, though.
> And maybe add a "t_handler" field that points out to handler functions.
> I don't sure if it will be a name of StorageAm, or its OID, or maybe the
> main function itself. Although, If I'm not mistaken, we always have
> RelationData when we want to operate the tuple, so having t_handler
> in the StorageTuple is excessive.
Yeah, I think the RelationData (or more precisely the StorageAmRoutine)
is going to be available always, so I don't think we need a pointer in
the tuple itself.
> This approach allows to minimize code changes and ensure that we
> won't miss any function that handles tuples.
>
> Do you see any weak points of the suggestion?
> What design do you use in your prototype?
It's currently a "void *" pointer in my prototype.
> >RelationData gains ->rd_stamroutine which is a pointer to the
> >StorageAmRoutine for the relation in question. Similarly,
> >TupleTableSlot is augmented with a link to the StorageAmRoutine to
> >handle the StorageTuple it contains (probably in most cases it's set at
> >the same time as the tupdesc). This implies that routines such as
> >ExecAssignScanType need to pass down the StorageAmRoutine from the
> >relation to the slot.
>
> If we already have this pointer in t_handler as described below,
> we don't need to pass it between functions and slots.
I think it's better to have it in slots, so you can install multiple
tuples in the slot without having to change the routine pointers each
time.
> >The executor is modified so that instead of calling heap_insert etc
> >directly, it uses rel->rd_stamroutine to call these methods. The
> >executor is still in charge of dealing with indexes, constraints, and
> >any other thing that's not the tuple storage itself (this is one major
> >point in which this differs from FDWs). This all looks simple enough,
> >with one exception and a few notes:
>
> That is exactly what I tried to describe in my proposal.
> Chapter "Relation management". I'm sure, you've already noticed
> that it will require huge source code cleaning. I've carefully read
> the sources and found "violators" of abstraction in src/backend/commands.
> The list is attached to the wiki page
> https://wiki.postgresql.org/wiki/HeapamRefactoring.
>
> Except these, there are some pretty strange and unrelated functions in
> src/backend/catalog.
> I'm willing to fix them, but I'd like to synchronize our efforts.
I very much would like to stay away from touching src/backend/catalog,
which are the functions that deal with system catalogs. We can simply
say that system catalogs are hardcoded to use heapam.c storage for now.
If we later see a need to enable some particular catalog using a
different storage implementation, we can change the code for that
specific catalog in src/backend/catalog and everywhere else, to use the
abstract API instead of hardcoding heap_insert etc. But that can be
left for a second pass. (This is my point "iv" further below, to which
you said "+1").
> Nothing to do, just substitute t_data with proper HeapTupleHeader
> representation. I think it's a job for StorageAm. Let's say each StorageAm
> must have stam_to_heaptuple() function and opposite function
> stam_from_heaptuple().
Hmm, yeah, that also works. We'd have to check again whether it's more
convenient to start as a slot rather than a StorageTuple. AFAICS the
trigger.c code is all starting from a slot, so it makes sense to have
the conversion use the slot code -- that way, there's no need for each
storageAM to re-implement conversion to HeapTuple.
> >note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
> >format. In the long run, it may be possible to replace them with a
> >separate storage module that's specifically designed to handle tuples
> >meant for tuplestores etc. That may simplify TupleTableSlot and
> >execTuples. For the moment we keep the tts_mintuple as it is. Whenever
> >a tuple is not already in heap format, we heapify it in order to put in
> >the store.
> I wonder, do we really need MinimalTuples to support all formats?
Sure. I wouldn't want to say "you can create table in columnar storage
format, but if you do, these tables cannot use hash join".
> >ii) execTuples has additional accessors for tuples-in-slot, such as
> >ExecFetchSlotTuple and friends. I expect to have some of them to return
> >abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
> >wrapped in Datum), depending on callers. We might be able to cut down
> >on these later; my first cut will try to avoid API changes to keep
> >fallout to a minimum.
>
> I'd suggest replacing all occurrences of HeapTuple with StorageTuple.
> Do you see any problems with it?
The HeapTuple-in-datum representation, as I recall, is used in the SQL
function manager; maybe other places too. Maybe there's a way to fix
that layer so that it uses StorageTuple instead, but I prefer not to
touch it in the first phase. We can fix it later. This is already a
big enough patch ...
> >iii) All tuples need to be identifiable by ItemPointers. Storages that
> >have different requirements will need careful additional thought across
> >the board.
>
> For a start, we can simply deny secondary indexes for these storages
> or require a function that converts tuple identifier inside the storage to
> ItemPointer suitable for an index.
Umm. I don't think rejecting secondary indexes would work very well. I
think we can lift this limitation later; we just need to change the
IndexTuple abstraction so that it doesn't rely on ItemPointer as
currently.
> >v) Currently, one Buffer may be associated with one HeapTuple living in a
> >slot; when the slot is cleared, the buffer pin is released. My current
> >patch moves the buffer pin to inside the heapam-based storage AM and the
> >buffer is released by the ->slot_clear_tuple method. The rationale for
> >doing this is that some storage AMs might want to keep several buffers
> >pinned at once, for example, and must not to release those pins
> >individually but in batches as the scan moves forwards (say a batch of
> >tuples in a columnar storage AM has column values spread across many
> >buffers; they must all be kept pinned until the scan has moved past the
> >whole set of tuples). But I'm not really sure that this is a great
> >design.
>
> Frankly, I doubt that it's real to implement columnar storage just as
> a variant of pluggable storage. It requires a lot of changes in executor
> and optimizer and so on, which are hardly compatible with existing
> tuple-oriented model. However I'm not so good in this area, so if you
> feel that it's possible, go ahead.
Well, not *just* as a variant of pluggable storage. This thread is just
one sub-project inside the greater project to enable column-oriented
storage; that includes further changes to executor, too, but I haven't
discussed those in this proposal. I mentioned all this in Brussels'
developer meeting earlier this year. (There I mostly talked about
vertical partitioning, which is a different subproject that I've put
aside for the moment, but really it's all part of the same thing.)
https://wiki.postgresql.org/wiki/Future_of_storage
Thanks for reading!
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Gavin Flower | 2016-08-17 18:14:40 | Re: [GENERAL] C++ port of Postgres |
Previous Message | Alvaro Herrera | 2016-08-17 16:35:21 | Re: PATCH: Exclude additional directories in pg_basebackup |