Re: Refactoring of heapam code.

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring of heapam code.
Date: 2016-09-12 10:12:16
Message-ID: CABOikdM8yXfyC82Vt3ZvQsrm0MHeDoquGe56utGkrWrMgTTDqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 6, 2016 at 8:39 PM, Anastasia Lubennikova <
a(dot)lubennikova(at)postgrespro(dot)ru> wrote:

> 06.09.2016 07:44, Pavan Deolasee:
>
> 2. I don't understand the difference between PageGetItemHeapHeaderOnly()
> and PageGetItemHeap(). They seem to do exactly the same thing and can be
> used interchangeably.
>
>
> The only difference between these two macros is that
> PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
> it only checks header fields (see HeapTupleHeaderData). I divided it into
> two separate functions, while I was working on page compression and
> I wanted to avoid unnecessary decompression calls. These names are
> just a kind of 'markers' to make the code reading and improving easier.
>
>
Ok. I still don't get it, but that's probably because I haven't seen a real
use case of that. Right now, both macros look exactly the same.

> 3. While I like the idea of using separate interfaces to get heap/index
> tuple from a page, may be they can internally use a common interface
> instead of duplicating what PageGetItem() does already.
>
>
> I don't sure I get it right. Do you suggest to leave PageGetItem and write
> PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
> It sounds reasonable while we have similar layout for both heap and index
> pages.
> In any case, it'll be easy to separate them when necessary.
>
>
Yes, that's what I was thinking.

> 4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or something
> like that?
>
>
> I don't feel like doing that, because HeapTuple is a different structure.
> What we do get from page is a HeapTupleHeaderData structure
> followed by user's data.
>

Ok, makes sense.

>
>
> I also looked at the refactoring design doc. Looks like a fine approach to
> me, but the API will need much more elaborate discussions. I am not sure if
> the interfaces as presented are enough to support everything that even
> heapam can do today.
>
>
> What features of heapam do you think could be unsupportable in this API?
> Maybe I've just missed them.
>

I was thinking about locking, bulk reading (like page-mode API) etc. While
you've added an API for vacuuming, we would probably also need an API to
collect dead tuples, pruning etc (not sure if that can be combined with
vacuum). Of course, these are probably very specific to current
implementation of heap/MVCC and not all storages will need that.

>
> I suggest refactoring, that will allow us to develop new heap-like access
> methods.
> For the first version, they must have methods to "heapify" tuple i.e turn
> internal
> representation of the data to regular HeapTuple, for example put some
> stubs into
> MVCC fields. Of course this approach has its disadvantages, such as
> performance issues.
> It definitely won't be enough to write column storage or to implement
> other great
> data structures. But it allows us not to depend of the Executor's code.
>
>
Ok, understood.

>
> - There are many upper modules that need access to system attributes
> (xmin, xmax for starters). How do you plan to handle that? You think we can
> provide enough abstraction so that the callers don't need to know the tuple
> structures of individual PAM?
>
>
> To be honest, I didn't thought about it.
> Do you mean external modules or upper levels of abstraction in Postgres?
>

I meant upper levels of abstraction like the executor. For example, while
inserting a new tuple, the executor (the index AM's insert routine to be
precise) may need to wait for another transaction to finish. Today, it can
easily get that information by looking at the xmax of the conflicting
tuple. How would we handle that with abstraction since not every PAM will
have a notion of xmax?

Thanks,
Pavan

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?= 2016-09-12 11:28:18 [PATCH] Tab completion for ALTER TYPE … RENAME VALUE …
Previous Message Kyotaro HORIGUCHI 2016-09-12 09:02:59 Re: asynchronous and vectorized execution