Re: Pluggable storage

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable storage
Date: 2017-06-22 01:01:46
Message-ID: CAB7nPqR6TorHQ_mXmAiB_cL40Vf2+F6hsue-P9HVGoxVxoHyKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 22, 2017 at 4:47 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think that BitmapHeapScan, at least, is applicable to any table AM
> that has TIDs. It seems to me that in general we can imagine three
> kinds of table AMs:
>
> 1. Table AMs where a tuple can be efficiently located by a real TID.
> By a real TID, I mean that the block number part is really a block
> number and the item ID is really a location within the block. These
> are necessarily quite similar to our current heap, but they can change
> the tuple format and page format to some degree, and it seems like in
> many cases it should be possible to plug them into our existing index
> AMs without too much heartache. Both index scans and bitmap index
> scans ought to work.
>
> 2. Table AMs where a tuple has some other kind of locator. For
> example, imagine an index-organized table where the locator is the
> primary key, which is a bit like what Alvaro had in mind for indirect
> indexes. If the locator is 6 bytes or less, it could potentially be
> jammed into a TID, but I don't think that's a great idea. For things
> like int8 or numeric, it won't work at all. Even for other things,
> it's going to cause problems because the bit patterns won't be what
> the code is expecting; e.g. bitmap scans care about the structure of
> the TID, not just how many bits it is. (Due credit: Somebody, maybe
> Alvaro, pointed out this problem before, at PGCon.) For these kinds
> of tables, larger modifications to the index AMs are likely to be
> necessary, at least if we want a really general solution, or maybe we
> should have separate index AMs - e.g. btree for traditional TID-based
> heaps, and generic_btree or indirect_btree or key_btree or whatever
> for heaps with some other kind of locator. It's not too hard to see
> how to make index scans work with this sort of structure but it's very
> unclear to me whether, or how, bitmap scans can be made to work.
>
> 3. Table AMs where a tuple doesn't really have a locator at all. In
> these cases, we can't support any sort of index AM at all. When the
> table is queried, there's really nothing the core system can do except
> ask the table AM for a full scan, supply the quals, and hope the table
> AM has some sort of smarts that enable it to optimize somehow. For
> example, you can imagine converting cstore_fdw into a table AM of this
> sort - ORC has a sort of inbuilt BRIN-like indexing that allows whole
> chunks to be proven uninteresting and skipped. (You could use chunk
> number + offset to turn this into a table AM of the previous type if
> you wanted to support secondary indexes; not sure if that'd be useful,
> but it'd certainly be harder.)
>
> I'm more interested in #1 than in #3, and more interested in #3 than
> #2, but other people may have different priorities.

Putting that in a couple of words.
1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.

Getting into having #1 first to work out would already be really
useful for users. My take on the matter is that being able to plug in
in-core index AMs directly into a table AM #1 is more useful in the
long term, as it is possible for multiple table AMs to use the same
kind of index AM which is designed nicely enough. So the index AM
logic basically does not need to be duplicated across multiple table
AMs. #3 implies that the index AM logic is implemented in the table
AM. Not saying that it is not useful, but it does not feel natural to
have the planner request for a sequential scan, just to have the table
AM secretly do some kind of index/skipping scan.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-06-22 01:03:05 Re: PATCH: Batch/pipelining support for libpq
Previous Message Amit Langote 2017-06-22 00:47:13 Re: Adding support for Default partition in partitioning