Re: [PATCH] Opclass parameters

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Nikolay Shaplov <dhyan(at)nataraj(dot)su>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, obartunov(at)gmail(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Opclass parameters
Date: 2018-12-06 00:00:07
Message-ID: 6105d858-a060-dec7-48a1-016ac24884cf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached 3rd version of the patches.

On 20.11.2018 14:15, Nikolay Shaplov wrote:
> В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал:
>> Attached 2nd version of the patches. Nothing has changed since March,
>> this is just a rebased version.
>>
>> CREATE INDEX syntax and parameters storage method still need discussion.
> I've played around a bit with you patch and come to some conclusions, I'd like
> to share. They are almost same as those before, but now there are more
> details.

> Again some issues about storing opclass options in pg_inedx:
>
> 1. Having both indoption and indoptions column in pg_index will make someone's
> brain explode for sure. If not, it will bring troubles when people start
> confusing them.
>
> 2. Now I found out how do you store option values for each opclass: text[] of
> indoptions in pg_index is not the same as text[] in
> reloptions in pg_catalog (and it brings more confusion). In reloption each
> member of the array is a single option.
>
> reloptions | {fillfactor=90,autovacuum_enabled=false}
>
> In indoptions, is a whole string of options for one of the indexed attributes,
> each array item has all options for one indexed attribute. And this string
> needs furthermore parsing, that differs from reloption parsing.
>
> indoptions | {"{numranges=150}","{numranges=160}"}
>
>
> This brings us to the following issues:
>
> 2a. pg_index stores properties of index in general. Properties of each indexed
> attributes is stored in pg_attribute table. If we follow this philosophy
> it is wrong to add any kind of per-attribute array values into pg_index. These
> values should be added to pg_attribute one per each pg_attribute entry.
>
> 2b. Since you've chosen method of storage that differs from one that is used
> in reloptions, that will lead to two verstions of code that processes the
> attributes. And from now on, if we accept this, we should support both of them
> and keep them in sync. (I see that you tried to reuse as much code as
> possible, but still you added some more that duplicate current reloptions
> functionality.)

On 21.11.2018 18:04, Robert Haas wrote:

> It seems sensible to have both per-column options and per-index
> options. For example, we've got the fastupdate option for GIN, which
> is a property of the index as a whole, not any individual column. But
> you could also want to specify some column-specific options, which
> seems to be what this patch is about, since an opclass is associated
> with an individual column. And since an index can have more than one
> column, I agree that it seems more appropriate to store this
> information in pg_attribute than pg_index.

Ok, I have switched from pg_index.indoptions to pg_attribute.attoptions.

I agree that we should distinguish per-index and per-column options, but they
can also be AM-specific and opclass-specific.

'fastupdate' option for GIN is an example of AM-specific per-index option.

ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column
options having special SQL syntax. "AM-class-specific" here means "specific
only for the class of AMs that support ordering". Now they are stored as flags
in pg_index.indoption[] but later can be moved to pg_attribute.attoptions.

Don't know should per-column AM-specific and opclass-specific options be stored
in the single column attoptions or have separate columns (like attamoptions and
attopclassoptions). If we will store them together, we can run into name
collisions, but this collisions can be easily resolved using autogenerated
prefixes in option names ("am.foo=bar", "opclass.foo=baz").

And another problem is the options with default values. They may be not
explicitly specified by the user, and in this case in current implementation
nothing will be stored in the catalog because default values can be obtained
from the code. But this will not allow changing of default values without
compatibility issues. So I think it would be better to store both default and
explicitly specified values of all opclass options, but this will require a
major refactoring of current API.

Also I have idea to define list of opclass parameters declaratively when opclass
is created using syntax like the following:

CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) ]
FOR TYPE ... AS (
{ OPTIONS function_name ( arg_type [,...] ) /* reloptions => binary */
| OPERATOR ...
} [,...]
)

But if we remember about the min/max values etc. the syntax will definitely
become more complicated, and it will require much more changes in the catalog
(up to creation of new table pg_opclassparams).

In any case, I think it would be nice to give somehow the user the ability to
obtain a list of opclass parameters not only from the documentation.

On 20.11.2018 14:15, Nikolay Shaplov wrote:

> I know that relotions code is not really suitable for reusing. This was the
> reason why I started solving oplcass option task with rewriting reloptions
> code,to make it 100% reusable for any kind of options. So I would offer you
> again to join me as a reviewer of that code. This will make opclass code more
> simple and more sensible, if my option code is used...

I absolutely agree that reloptions code needs such refactoring, and I am
planning to continue reviewing your patches.

> 3. Speaking of sensible code
>
> Datum
> g_int_options(PG_FUNCTION_ARGS)
> {
> Datum raw_options = PG_GETARG_DATUM(0);
> bool validate = PG_GETARG_BOOL(1);
> relopt_int siglen =
> { {"numranges", "number of ranges for compression", 0, 0, 9,
> RELOPT_TYPE_INT },
> G_INT_NUMRANGES_DEFAULT, 1, G_INT_NUMRANGES_MAX };
> relopt_gen *optgen[] = { &siglen.gen };
> int offsets[] = { offsetof(IntArrayOptions, num_ranges) };
> IntArrayOptions *options =
> parseAndFillLocalRelOptions(raw_options, optgen, offsets, 1,
> sizeof(IntArrayOptions), validate);
>
> PG_RETURN_POINTER(options);
> }
>
> It seems to be not a very nice hack.
> What would you do if you would like to have both int, real and boolean options
> for one opclass? I do not know how to implement it using this code.
> We have only int opclass options for now, but more will come and we should be
> ready for it.

Don't understand where is hack.

It's possible to parse options of different types, see the following example:

typedef struct FooOptions
{
int int_opt;
int str_opt_offset;
} FooOptions;

relopt_int int_option = { {"int_opt", "desc", 0, 0, 7, RELOPT_TYPE_INT },
INT_OPT_DEFAULT, INT_OPT_MIN, INT_OPT_MAX };

relopt_string str_option = { {"str_opt", "desc", 0, 0, 7, RELOPT_TYPE_STRING },
0, true, validate_str_option, NULL };

relopt_gen *gen_options[] = { &int_option.gen, &str_option.gen };

int offsets[] = { offsetof(FooOptions, int_opt),
offsetof(FooOptions, str_opt_offset) };

FooOptions *opts =
parseAndFillLocalRelOptions(raw_options, gen_options, offsets, 2,
sizeof(FooOptions), validate);

int int_option_value = opts->int_opt;
char *str_option_value = (char *) opts + opts->str_opt_offset;

Also I can propose to combine this two arrays into the one:

relopt_gen_offset options[] = {
&int_option.gen, offsetof(FooOptions, int_opt),
&str_option.gen, offsetof(FooOptions, str_opt_offset)
};

> 4. Now getting back to not adding opclass options wherever we can, just
> because we can:
>
> 4a. For inrarray there were no opclass options tests added. I am sure there
> should be one, at least just to make sure it still does not segfault when you
> try to set one. And in some cases more tests can be needed. To add and review
> them one should be familiar with this opclass internals. So it is good when
> different people do it for different opclasses
>
> 4b. When you add opclass options instead of hardcoded values, it comes to
> setting minimum and maximum value. Why do you choose 1000 as maximum
> for num_ranges in gist_int_ops in intarray? Why 1 as minimum? All these
> decisions needs careful considerations and can't be done for bunch of
> opclasses just in one review.
>
> 4c. Patch usually take a long path from prototype to final commit. Do you
> really want to update all these opclasses code each time when some changes
> in the main opclass option code is made? ;-)
>
> So I would suggest to work only with intarray and add other opclasses later.

I decided to leave only GiST tsvector_ops as an example, because it is an
in-core opclass, not an extension like intarray.

> 5. You've been asking about SQL grammar
>
>> CREATE INDEX idx ON tab USING am (
>> {expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
>> );
> As for me I do not really care about it. For me all the solutions is
> acceptable. But looking at is i came to one notion:
>
> I've never seen before DEFAULT keyword to be used in this way. There is logic
> in such usage, but I did not remember any practical usage case.
> If there are such usages (I can easily missed it) or if it is somehow
> recommended in SQL standard -- let it be. But if none above, I would suggest
> to use WITH keyword instead. As it is already used for reloptions. As far as I
> remember in my prototype I used "WITH OPTIONS" but did if just because did not
> find my way through yac with single "WITH". So ideal way for me would be
>
> create index ON test USING GIST (i WITH (sig_len_int=22));
>
> But as I said it is not thing of importance for me. Just an observation.

"[opclass] WITH OPTIONS (options)" looks too verbose, of course.

"[opclass] WITH (options)" looks acceptable, but it seems to conflict with
exclusion constraints syntax ("index_elem WITH operator").

"opclass (options)" looks the most natural to me. But we still need some
keyword before the parentheses when the opclass is not specified since we
can't distinguish "func_name (func_params)" and "col_name (opclass_options)"
in grammar.

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Add-opclass-parameters-v03.patch text/x-patch 41.2 KB
0002-Add-opclass-parameters-to-GiST-v03.patch text/x-patch 13.7 KB
0003-Add-opclass-parameters-to-GIN-v03.patch text/x-patch 15.1 KB
0004-Add-opclass-parameters-to-GiST-tsvector_ops-v03.patch text/x-patch 31.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Phil Endecott 2018-12-06 00:34:03 Limitting full join to one match
Previous Message Petr Jelinek 2018-12-05 23:55:09 Re: Connection slots reserved for replication