Re: Status of the table access method work

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Status of the table access method work
Date: 2019-04-08 16:38:59
Message-ID: 20190408163859.ufjlhtl54dpqp5pu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

On 2019-04-08 14:53:53 +0300, Heikki Linnakangas wrote:
> On 05/04/2019 23:25, Andres Freund wrote:
> > - the (optional) bitmap heap scan API - that's fairly intrinsically
> > block based. An AM could just internally subdivide TIDs in a different
> > way, but I don't think a bitmap scan like we have would e.g. make a
> > lot of sense for an index oriented table without any sort of stable
> > tid.
>
> If an AM doesn't implement the bitmap heap scan API, what happens? Bitmap
> scans are disabled?

Yea, the planner doesn't consider them. It just masks the index's
amhasgetbitmap. Seems to be the most reasonable thing to do?

> Even if an AM isn't block-oriented, the bitmap heap scan API still makes
> sense as long as there's some correlation between TIDs and physical
> location.

Yea, it could be a non-linear mapping. But I'm honestly not sure how
many non-block oriented AMs with such a correlation there are - I mean
you're not going to have that in say an IOT. And it'd be trivial to just
"fake" a block mapping for an in-memory AM.

> The only really broken thing about that currently is the
> prefetching: nodeBitmapHeapScan.c calls PrefetchBuffer() directly with the
> TID's block numbers. It would be pretty straightforward to wrap that in a
> callback, so that the AM could do something different.

That, and the VM_ALL_VISIBLE() checks both in nodeBitmapHeapscan.c and
nodeIndexonlyscan.c.

> Or move even more of the logic to the AM, so that the AM would get the whole
> TIDBitmap in table_beginscan_bm(). It could then implement the fetching and
> prefetching as it sees fit.
>
> I don't think it's urgent, though. We can cross that bridge when we get
> there, with the first AM that needs that flexibility.

Yea, it seemed nontrivial (not in really hard, just not obvious), and
the implicated code duplication scared me away.

> > The most constraining factor for storage, I think, is that currently the
> > API relies on ItemPointerData style TIDs in a number of places (i.e. a 6
> > byte tuple identifier).
>
> I think 48 bits would be just about enough

I don't think that's really true. Consider e.g. implementing an index
oriented table - there's no way you can efficiently implement one with
that small a key. You basically need a helper index just to have
efficient and small enough tids. And given that we're also going to
need wider tids for global indexes, I suspect we're just going to have
to bite into the sour apple and make tids variable width.

> , but it's even more limited than
> you might at the moment. There are a few places that assume that the
> offsetnumber <= MaxHeapTuplesPerPage. See ginpostinglist.c, and
> MAX_TUPLES_PER_PAGE in tidbitmap.c. Also, offsetnumber can't be 0, because
> that makes the ItemPointer invalid, which is inconvenient if you tried to
> use ItemPointer as just an arbitrary 48-bit integer.

Good point.

Thanks for looking (and playing, in the other thread)!

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-04-08 17:14:17 Re: lazy_scan_heap() forgets to mark buffer dirty when setting all frozen?
Previous Message Jesper Pedersen 2019-04-08 16:33:55 COLLATE: Hash partition vs UPDATE