Pluggable Storage - Andres's take

From: Andres Freund <andres(at)anarazel(dot)de>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Pluggable Storage - Andres's take
Date: 2018-07-03 07:06:45
Message-ID: 20180703070645.wchpu5muyto5n647@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.

The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s). But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.

The most fundamental issues I had with Haribabu's last version from [2]
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
fronts. For one it reduces just about all type safety. There were
numerous bugs in the patch where things were just cast from HeapTuple
to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
a really, really bad idea to introduce a vague type like this for
development purposes alone, it makes it way too hard to refactor -
essentially throwing the biggest benefit of type safe languages out of
the window.

Additionally I think it's also the wrong approach architecturally. We
shouldn't assume that a tuple can efficiently be represented as a
single palloc'ed chunk. In fact, we should move *away* from relying on
that so much.

I've thus removed the TableTuple type entirely.

- Previous verions of the patchset exposed Buffers in the tableam.h API,
performed buffer locking / pinning / ExecStoreTuple() calls outside of
it. That is wrong in my opinion, as various AMs will deal very
differently with buffer pinning & locking. The relevant logic is
largely moved within the AM. Bringing me to the next point:

- tableam exposed various operations based on HeapTuple/TableTuple's
(and their Buffers). This all need be slot based, as we can't
represent the way each AM will deal with this. I've largely converted
the API to be slot based. That has some fallout, but I think largely
works. Lots of outdated comments.

- I think the move of the indexing from outside the table layer into the
storage layer isn't a good idea. It lead to having to pass EState into
the tableam, a callback API to perform index updates, etc. This seems
to have at least partially been triggered by the speculative insertion
codepaths. I've reverted this part of the changes. The speculative
insertion / confirm codepaths are now exposed to tableam.h - I think
that's the right thing because we'll likely want to have that
functionality across more than a single tuple in the future.

- The visibility functions relied on the *caller* performing buffer
locking. That's not a great idea, because generic code shouldn't know
about the locking scheme a particular AM needs. I've changed the
external visibility functions to instead take a slot, and perform the
necessary locking inside.

- There were numerous tableam callback uses inside heapam.c - that makes
no sense, we know what the storage is therein. The relevant

- The integration between index lookups and heap lookups based on the
results on a index lookup was IMO too tight. The index code dealt
with heap tuples, which isn't great. I've introduced a new concept, a
'IndexFetchTableData' scan. It's initialized when building an index
scan, and provides the necessary state (say current heap buffer), to
do table lookups from within a heap.

- The am of relations required for bootstrapping was set to 0 - I don't
think that's a good idea. I changed it so it's set to the heap AM as
well.

- HOT was encoded in the API in a bunch of places. That doesn't look
right to me. I tried to improve a bit on that, but I'm not yet quite
sure I like it. Needs written explanation & arguments...

- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
a higher memory usage, because the resulting tuples weren't freed or
anything. There might be a reason for doing such a change - we've
certainly discussed that before - but I'm *vehemently* against doing
that at the same time we introduce pluggable storage. Analyzing the
performance effects will be hard enough without changes like this.

- I've for now backed out the heap rewrite changes, partially. Mostly
because I didn't like the way the abstraction looks, but haven't quite
figured out how it should look like.

- I did not like that speculative tokens were moved to slots. There's
really no reason for them to live outside parameters to tableam.h
functsions.

- lotsa additional smaller changes.

- lotsa new bugs

My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work. If others want I could
move repo to github and grant others write access.

I think the patchseries should eventually look like:

- move vacuumlazy.c (and other similar files) into access/heap, there's
really nothing generic here. This is a fairly independent task.
- slot'ify FDW RefetchForeignRow_function
- vtable based slot API, based on [1]
- slot'ify trigger API
- redo EPQ based on slots (prototyped in my git tree)
- redo trigger API to be slot based
- tuple traversal API changes
- tableam infrastructure, with error if a non-builtin AM is chosen
- move heap and calling code to be tableam based
- make vacuum callback based (not vacuum.c, just vacuumlazy.c)
- [other patches]
- allow other AMs
- introduce test AM

Tasks / Questions:

- split up patch
- Change heap table AM to not allocate handler function for each table,
instead allocate it statically. Avoids a significant amount of data
duplication, and allows for a few more compiler optimizations.
- Merge tableam.h and tableamapi.h and make most tableam.c functions
small inline functions. Having one-line tableam.c wrappers makes this
more expensive than necessary. We'll have a big enough trouble not
regressing performancewise.
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
- COPY's multi_insert path should probably deal with a bunch of slots,
rather than forming HeapTuples
- bitmap index scans probably need a new tableam.h callback, abstracting
bitgetpage()
- suspect IndexBuildHeapScan might need to move into the tableam.h API -
it's not clear to me that it's realistically possible to this in a
generic manner.

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de
[2] http://archives.postgresql.org/message-id/CAJrrPGcN5A4jH0PJ-s=6k3+SLA4pozC4HHRdmvU1ZBuA20TE-A@mail.gmail.com
[3] https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage
[4] https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-07-03 07:29:55 Re: Should contrib modules install .h files?
Previous Message Michael Paquier 2018-07-03 07:05:51 Re: pgsql: Clarify use of temporary tables within partition trees