Re: Refactoring of heapam code.

From: Thom Brown <thom(at)linux(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring of heapam code.
Date: 2016-08-05 09:17:54
Message-ID: CAA-aLv4TS4-QUn=NXcJ2J0wdR4=y9X8etJ1kFWjQXBwpPsSVCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 August 2016 at 08:54, Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> Working on page compression and some other issues related to
> access methods, I found out that the code related to heap
> looks too complicated. Much more complicated, than it should be.
> Since I anyway got into this area, I want to suggest a set of improvements.
>
> There is a number of problems I see in the existing code:
> Problem 1. Heap != Relation.
>
> File heapam.c has a note:
> * This file contains the heap_ routines which implement
> * the POSTGRES heap access method used for all POSTGRES
> * relations.
> This statement is wrong, since we also have index relations,
> that are implemented using other access methods.
>
> Also I think that it's quite strange to have a function heap_create(), that
> is called
> from index_create(). It has absolutely nothing to do with heap access
> method.
>
> And so on, and so on.
>
> One more thing that requires refactoring is "RELKIND_RELATION" name.
> We have a type of relation called "relation"...
>
> Problem 2.
> Some functions are shared between heap and indexes access methods.
> Maybe someday it used to be handy, but now, I think, it's time to separate
> them, because IndexTuples and HeapTuples have really little in common.
>
> Problem 3. The word "heap" is used in many different contexts.
> Heap - a storage where we have tuples and visibility information.
> Generally speaking, it can be implemented over any data structure,
> not just a plain table as it is done now.
>
> Heap - an access method, that provides a set of routines for plain table
> storage, such as insert, delete, update, gettuple, vacuum and so on.
> We use this access method not only for user's tables.
>
> There are many types of relations (pg_class.h):
> #define RELKIND_RELATION 'r' /* ordinary table */
> #define RELKIND_INDEX 'i' /* secondary index */
> #define RELKIND_SEQUENCE 'S' /* sequence object */
> #define RELKIND_TOASTVALUE 't' /* for out-of-line
> values */
> #define RELKIND_VIEW 'v' /* view */
> #define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
> #define RELKIND_FOREIGN_TABLE 'f' /* foreign table */
> #define RELKIND_MATVIEW 'm' /* materialized view */
>
> They can be logically separated into three categories:
> "primary storage" - r, S, t, v. They store data and visibility information.
> The only implementation is heapam.c
> "secondary index" - i. They store some data and pointers to primary storage.
> Various existing AMs and managed by AM API.
> "virtual relations" - c, f, m. They have no physical storage, only entries
> in caches and catalogs.
>
> Now, for some reasons, we sometimes use name "heap" for both
> "primary storage" and "virual relations". See src/backend/catalog/heap.c.
> But some functions work only with "primary storage". See pgstat_relation().
> And we constantly have to check relkind everywhere.
> I'd like it would be clear from the code interface and naming.
>
> So there is a couple of patches. They do not cover all mentioned problems,
> but I'd like to get a feedback before continuing.
>
> Patch 1:
> There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
> from the given page. It is used for both heap and index tuples.
> It doesn't seems a problem, until you are going to find anything in this
> code.
>
> The first patch "PageGetItem_refactoring.patch" is intended to fix it.
> It is just renaming:
> (IndexTuple) PageGetItem ---> PageGetItemIndex
> (HeapTupleHeader) PageGetItem ---> PageGetItemHeap
> Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple,
> SpGistDeadTuple)
> still use PageGetItem.
>
> I also changed functions that do not access tuple's data, but only need
> HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.
>
> I do not insist on these particular names, by the way.
>
> Patch 2.
> heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
> and outdated comments. The patch is intended to improve it.
> Fun fact: I found several "check it later" comments unchanged since
> "Postgres95 1.01 Distribution - Virgin Sources" commit.
>
> I wonder if we can wind better name relation_drop_with_catalog() since,
> again, it's
> not about all kinds of relations. We use another function to drop index
> relations.
>
> I hope these changes will be useful for the future development.
> Suggested patches are mostly about renaming and rearrangement of functions
> between files, so, if nobody has conceptual objections, all I need from
> reviewers
> is an attentive look to find typos, grammar mistakes and overlooked areas.

Could you add this to the commitfest?

Thom

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Murat Tuncer 2016-08-05 09:21:40 truncate trigger for foreign data wrappers
Previous Message Andrew Gierth 2016-08-05 08:42:03 Re: Bogus ANALYZE results for an otherwise-unique column with many nulls