[PATCH] Opclass parameters

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject: [PATCH] Opclass parameters
Date: 2018-02-27 21:46:36
Message-ID: d22c3a18-31c7-1879-fc11-4c1ce2f5e5af@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers.

I would like to present patch set implementing opclass parameters.

This feature was recently presented at pgconf.ru:
http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf

A analogous work was already done by Nikolay Shaplov two years ago:
https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64
But this patches are not based on it, although they are very similar.

Opclass parameters can give user ability to:
* Define the values of the constants that are hardcoded now in the opclasses
depending on the indexed data.
* Specify what to index for non-atomic data types (arrays, json[b], tsvector).
Partial index can only filter whole rows.
* Specify what indexing algorithm to use depending on the indexed data.

Description of patches:

1. Infrastructure for opclass parameters.

SQL grammar is changed only for CREATE INDEX statement: parenthesized parameters
in reloptions format are added after column's opclass name. Default opclass can
be specified with DEFAULT keyword:

CREATE INDEX idx ON tab USING am (
{expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
);

Example for contrib/intarray:

CREATE INDEX ON arrays USING gist (
arr gist__intbig_ops (siglen = 32),
arr DEFAULT (numranges = 100)
);

\d arrays
Table "public.arrays"
Column | Type | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
arr | integer[] | | |
Indexes:
"arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr gist__int_ops (numranges='100'))

I decided to store parameters in text[] column pg_index.indoptions near to
existing columns like indkey, indcollation, indclass, indoption. I-th element of
indoptions[] is a text array of parameters of i-th index column serialized into
a string. Each parameter is stored as 'name=value' text string like ordinal
reloptions. There is another way to store opclass parameters: store them in
the existing column pg_attribute.attoptions (as it was done by Nikolay Shaplov)
and there will be no need to serialize reloptions to a text array element.

Example query showing how parameters are stored:

SELECT ARRAY(
SELECT (pg_identify_object('pg_opclass'::regclass, opcid, 0)).name
FROM unnest(indclass::int[]) opcid
) indclass, indoptions
FROM pg_index
WHERE indoptions IS NOT NULL;

indclass | indoptions
----------------------------------+------------------------------------
{gist__intbig_ops,gist__int_ops} | {"{siglen=32}","{numranges=100}"}
{jsonb_path_ops} | {"{projection=$.tags[*].term}"}
(2 rows)

Each access method supporting opclass parameters specifies amopclassoptions
routine for transformation of text[] parameters datum into a binary bytea
structure which will be cached in RelationData and IndexOptInfo structures:

typedef bytea *(*amopclassoptions_function) (
Relation index, AttrNumber colnum, Datum indoptions, bool validate
);

If access method wants simply to delegate parameters processing to one of
column opclass's support functions, then it can use
index_opclass_options_generic() subroutine in its amopclassoptions
implementation:

bytea *index_opclass_options_generic(
Relation relation, AttrNumber attnum, uint16 procnum,
Datum indoptions, bool validate
);

This support functions must have the following signature:
internal (options internal, validate bool).
Opclass parameters are passed as a text[] reloptions datum, returned pointer to
a bytea structure with parsed parameter values.

Opclass can use new functions parseLocalRelOptions(),
parseAndFillLocalRelOptions() for reloptions parsing. This functions differ
from the standard parseRelOptions() in that a local array of reloptions
descriptions is passed here, not a global relopt_kind. But it seems that
reloptions processing still needs deeper refactoring like the one already done
by Nikolay Shaplov (https://www.postgresql.org/message-id/flat/2146419(dot)veIEZdk4E4%40x200m#2146419(dot)veIEZdk4E4(at)x200m).

2. Opclass parameters support in GiST indices.

Parametrized GiST opclass specifies optional 10th (GIST_OPCLASSOPT_PROC)
support function with the following signature:
internal (options internal, validate bool)

Returned parsed bytea pointer with parameters will be passed to all support
functions in the last argument.

3. Opclass parameters support in GIN indices.

Everything is the same as for GiST, except for the optional support
function number which is 7 (GIN_OPCLASSOPTIONS_PROC) here.

4. Opclass parameters for GiST tsvector_ops
5. Opclass parameters for contrib/intarray
6. Opclass parameters for contrib/ltree
7. Opclass parameters for contrib/pg_trgm
8. Opclass parameters for contrib/hstore

This 5 patches for GiST opclasses are very similar: added optional 'siglen'
parameter for specifying signature length. Default signature length is left
equal to the hardcoded value that was here before. Also added 'numranges'
parameter for gist__int_ops.

We also have two more complex unfinished patches for GIN opclasses which
should be posted in separate threads:

* tsvector_ops: added parameter 'weights' for specification of indexed
lexeme's weight groups. This parameter can reduce index size and its
build/update time and can also eliminate recheck. By default, all weights
are indexed within the same group.

* jsonb_ops: added jsonpath parameter 'projection' for specification of
indexed paths in jsonb (this patch depends on SQL/JSON jsonpath patch).
Analogically to tsvector_ops, this parameter can reduce index size and its
build/update time, but can not eliminate recheck.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-opclass-parameters-v01.patch text/x-patch 40.9 KB
0002-opclass-parameters-GiST-v01.patch text/x-patch 13.1 KB
0003-opclass-parameters-GIN-v01.patch text/x-patch 14.4 KB
0004-opclass-parameters-GiST-tsvector_ops-v01.patch text/x-patch 32.0 KB
0005-opclass-parameters-contrib_intarray-v01.patch text/x-patch 54.2 KB
0006-opclass-parameters-contrib_ltree-v01.patch text/x-patch 40.6 KB
0007-opclass-parameters-contrib_pg_trgm-v01.patch text/x-patch 22.5 KB
0008-opclass-parameters-contrib_hstore-v01.patch text/x-patch 17.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-02-27 22:07:28 Re: [HACKERS] MERGE SQL Statement for PG11
Previous Message Tom Kazimiers 2018-02-27 21:40:32 Re: Unexpected behavior with transition tables in update statement trigger