Refactoring of heapam code.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Refactoring of heapam code.
Date: 2016-08-05 07:54:00
Message-ID: d7e41e76-e565-8bc0-4e97-bd3d127a7507@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Working on page compression and some other issues related to
access methods, I found out that the code related to heap
looks too complicated. Much more complicated, than it should be.
Since I anyway got into this area, I want to suggest a set of improvements.

There is a number of problems I see in the existing code:
Problem 1. Heap != Relation.

File heapam.c has a note:
* This file contains the heap_ routines which implement
* the POSTGRES heap access method used for all POSTGRES
* relations.
This statement is wrong, since we also have index relations,
that are implemented using other access methods.

Also I think that it's quite strange to have a function heap_create(),
that is called
from index_create(). It has absolutely nothing to do with heap access
method.

And so on, and so on.

One more thing that requires refactoring is "RELKIND_RELATION" name.
We have a type of relation called "relation"...

Problem 2.
Some functions are shared between heap and indexes access methods.
Maybe someday it used to be handy, but now, I think, it's time to separate
them, because IndexTuples and HeapTuples have really little in common.

Problem 3. The word "heap" is used in many different contexts.
Heap - a storage where we have tuples and visibility information.
Generally speaking, it can be implemented over any data structure,
not just a plain table as it is done now.

Heap - an access method, that provides a set of routines for plain table
storage, such as insert, delete, update, gettuple, vacuum and so on.
We use this access method not only for user's tables.

There are many types of relations (pg_class.h):
#define RELKIND_RELATION 'r' /* ordinary table */
#define RELKIND_INDEX 'i' /* secondary index */
#define RELKIND_SEQUENCE 'S' /* sequence object */
#define RELKIND_TOASTVALUE 't' /* for out-of-line
values */
#define RELKIND_VIEW 'v' /* view */
#define RELKIND_COMPOSITE_TYPE 'c' /* composite type */
#define RELKIND_FOREIGN_TABLE 'f' /* foreign table */
#define RELKIND_MATVIEW 'm' /* materialized view */

They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Now, for some reasons, we sometimes use name "heap" for both
"primary storage" and "virual relations". See src/backend/catalog/heap.c.
But some functions work only with "primary storage". See pgstat_relation().
And we constantly have to check relkind everywhere.
I'd like it would be clear from the code interface and naming.

So there is a couple of patches. They do not cover all mentioned problems,
but I'd like to get a feedback before continuing.

Patch 1:
There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
from the given page. It is used for both heap and index tuples.
It doesn't seems a problem, until you are going to find anything in this
code.

The first patch "PageGetItem_refactoring.patch" is intended to fix it.
It is just renaming:
(IndexTuple) PageGetItem ---> PageGetItemIndex
(HeapTupleHeader) PageGetItem ---> PageGetItemHeap
Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple,
SpGistDeadTuple)
still use PageGetItem.

I also changed functions that do not access tuple's data, but only need
HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.

I do not insist on these particular names, by the way.

Patch 2.
heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
and outdated comments. The patch is intended to improve it.
Fun fact: I found several "check it later" comments unchanged since
"Postgres95 1.01 Distribution - Virgin Sources" commit.

I wonder if we can wind better name relation_drop_with_catalog() since,
again, it's
not about all kinds of relations. We use another function to drop index
relations.

I hope these changes will be useful for the future development.
Suggested patches are mostly about renaming and rearrangement of functions
between files, so, if nobody has conceptual objections, all I need from
reviewers
is an attentive look to find typos, grammar mistakes and overlooked areas.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
p1_pagegetitem_refactoring.patch text/x-patch 50.7 KB
p2_heapam_refactoring.patch text/x-patch 52.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2016-08-05 08:27:04 Re: Oddity in EXPLAIN for foreign/custom join pushdown plans
Previous Message Michael Paquier 2016-08-05 07:33:11 Re: regression test for extended query protocol