[PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

From: Nikolay Shaplov <dhyan(at)nataraj(dot)su>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM
Date: 2017-02-06 20:30:03
Message-ID: 2146419.veIEZdk4E4@x200m
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers


(Please see Attribution notice at the bottom of the letter)

Hi! Some time ago I've send a proposal, about removing all am-related code
from src/backend/access/common/reloptions.c and move it into somewhere inside
am code. It was in
https://www.postgresql.org/message-id/4636505.Bu9AKW1Kzc%40nataraj-amd64
thread.

Now I have a patch that is ready

Before explaining what have been done, I need to define some concepts, so it
would be easy to speak about them in the further explanation.

Reloptions values in postgres actually exists in four representations:
DefList*, TextArray[], Bytea and so called Values.

- DefList representation is a list of DefElem, it is the representation in
which reloptions comes from syntax analyzer, when relation is created or
altered, and also this representation that is needed for pg_dump to dump a
create statement that creates this relation.

- TextArray[] representation is a way, how reloptions are stored in
pg_catalog. It is TEXT[] attribute in the pg_catalog table and from the poinf
o view of postgres code it is Datum array of TEXT Datums, when read into the
memory

- Bytea (or I will also call it Binary) representation, is a representation
with which relation code really works. It has C-structure inside in which
fixed-length options are sored, some space is reserved for storing varlen
values after the structure, and certainly it has BYTEA header in front of all
of it. It is cached, it is fast to access. It is a good stuff.

- Values (or in some cases RawValues) -- internal representation of option
parser. Technically it is List of option_value structures (old name for it is
relopt_value). This representation is used for parsing and validating options.
Actually all the convertations between representations listed above are dove
through Values representation. Values are called RawValues when Values List
were already created, but it was not parsed yet, and all option values are in
raw state: represented as a text.

DefList and TextArray representations are converted in both ways(always
through Values representation): we need DefList -> TextArray to create or
change entry in the pg_catalog, and need TextArray -> DefList for pg_dump

Bytea representation is converted from TextArray representation (also through
Values representation). When relation code, tries to get cached
Bytea representation for certain relation, and none is actually cached, then
cache code calls extractRelOptions function that fetches entry from pg_catalog
for that relation, gets reloptions TextArray[], converts it into Bytea and
cache it.

Each reloption has it's own definition. This definitions tells how this option
should be parsed, validated and converted. Before my patch this information
were stored in relopt_gen and relopt_parse_elt structures. In my patch all
this information were moved into option_definition_basic structure (that is
actually old relopt_gen + relopt_parse_elt)

The set of options definition I would call a options catalog. Before my patch
there was one global options catalog (actually it was divided into parts by
data types and has another part for dynamically added options, but
nevertheless, it was global). After my patch there would be a separate options
catalog for each relation type. For AM-relation this catalog would be
available via AM-handler. for other relation types it would be still kept in
reloption.c

Now I am ready to tell what actually have been done in this patch:

1. I moved options definitions from global lists from reloptions.c
[type name]RelOpts to lists that are kept inside AM code. One list per AM.
heap options, toast options and options for all other relation types that are
not accessible through AccessMethod, all were kept inside reloptions.c, but
now it is one list for each relation type, not a global one

2. Each AccessMethod had amoptions function that was responsible for
converting reloptions from TextArray[] representation into Bytea
representaions. This function used functions from reloptions.c and also had
an array of relopt_parse_elt that defines how parsed reloption data should be
packed inside Bytea chunk. I've moved data from relopt_parse_elt array into
option definitions that are stored in the catalog (since catalog are now
defined inside the AccessMethod we have access to a structure of Bytea
representation at the place where we define option catalog)

3. Speaking of amoptions function, I've completely removed it, since we do not
need it. For developers of Access Methods who still want to do some custom
action with just parsed data, I've added postprocess_fun to option catalog. If
this function is defined, it is called by convertation function right after
Bytea representation were created. In postprocess_fun developer can do some
custom validation, or change just parsed data. This feature is used in bloom
index.

4. Instead of amoptions function I've added amrelopt_catalog function to the
Access Method. This function returns option definition catalog for an Access
Method. The catalog is used for processing options.

5. As before, relation cache calls extractRelOptions when it needs options
that were not previously cached.
Before my patch, it created Bytea representations for non-AM relations, and
called amoptions AM function to get Bytea representation for AM relation.
In by patch, it now gets option catalog, (using local functions for non-AM
relations, and using amrelopt_catalog function for AM-relations), and then
use this catalog to convert options from TextArray into Bytea representation.

6. I've added some more featues:

- Now you can set OPTION_DEFINITION_FLAG_FORBID_ALTER flag in definition of the
option, and then postgres will refuse to change option value using ALTER
command. In many indexes, some options are used only for index creation. After
this it's value is saved in MetaPage, and used from there. Nevertheless user
is allowed to change option value, though it affects nothing, one need to
recreate index to really change such value. Now using this flag we can prevent
user from getting an illusion that he successfully changed option value.

- After applying my patch if you try to set toast. option for table that does
not have toast, you will get an error. Before postgres will accept toast
option for a table without a toast, but this value will be lost. This is bad
behavior, so now postgres will throw an error in this case.

- I've noticed that all string relation options in postgres are technically
enum options. So I've added enum option type. This will save some bytes of the
memory and some tacts of the CPU. I also left string option type (through it
is not used now). A. Korotkov said that it might be needed later for storing
patterns, xml/json paths or something like that.

- Added some more tests that will trigger more options code. Though I am
dreaming about more advanced test system, that will allow to check that
certain value is received somewhere deep in the code.

7. And now the most controversial change. I've got rid of StdRdOptions
structure, in which options for heap, toast, nbtree, hash and spgist were
stored in Bytea representation. In indexes only fillfactor value were actually
used from the whole big structure. So I've created a separate structure for
each relation type. HeapOptions and ToastOptions are very similar to each
other. So common part of creating these catalogs were moved to
add_autovacuum_options function that are called while creating each catalog.

Now to the controversial part: in src/include/utils/rel.h had a lot of
Relation* macroses that used StdRdOptions structure. I've changed them into
View* Heap* and Toast* analogues, and changed the code to use these macroses
instead of Relation* one. But this part of code I least sure of. I'd like to
ask experienced developers to double check it.

8. I've moved all options-abstract code into options.c and options.h file, and
left in reloptions.c and reloptions.h only the code that concerns relation
options. Actually the main idea of this patch was to get this abstract code in
order to use it for custom attribute options later. All the rest just a side
effects :-)

So, before adding this to commitfest I want somebody to give a general look at
it, if the code is OK.

Alvaro Herrera, you once said that you can review the patch...

The patch is available in the attachment. It can be applied to current master

You can also see latest version on my github
https://github.com/dhyannataraj/postgres/tree/reloption_fix
at the reloption_fix branch.

ATTRIBUTION NOTICE: I wrote this patch, when I was an employee in Postgres
Professional company. So this patch should be attributed as patch from
Postgres Pro (or something like that). For some reasons I did not manage to
commit this patch while I left that job. But I got a permission to commit it
even if I left the company.
So my contribution to this patch as a independent developer was final code
cleanup, and writing some more comments. All the rest was a work of Postgres
Pro employee Nikolay Shaplov.

--
Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire.

Attachment Content-Type Size
reloptions6.diff text/x-patch 214.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-06 20:34:19 Re: Add pgstathashindex() to get hash index table statistics.
Previous Message Peter Eisentraut 2017-02-06 20:28:05 Re: pg_sequences bug ?