Re: [HACKERS] [PATCH] Generic type subscripting

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Date: 2019-10-31 02:35:28
Message-ID: 099309f0-4b41-2c78-aac8-0835fe9a9ee7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30.09.2019 14:57, Dmitry Dolgov wrote:

>> On Wed, Sep 25, 2019 at 10:22 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>>
>> This broke recently. Can you please rebase again?
> Thanks for noticing! Sure, here it is. We're quite close to the records.

Hi. I added new 5th patch to this patch set.

Jsonb subscripting uses text for representing subscript values. This is Ok for
object keys, but integer arrays indexes should be parsed at runtime. Another
problem is that floats can't be used as array indexes because integers simply
can't be parsed from a string containing a floating point.

But we can use float indexes in ordinary Postgres arrays:

SELECT ('{1,2,3}'::int[])[2.3];
int4
------
2
(1 row)

Also SQL standard allows to use float indexes in JSON path with implementation-
defined rounding or truncation:

SELECT jsonb_path_query('[1, 2, 3]', '$[1.3]');
jsonb_path_query
------------------
2
(1 row)

So, I decided to fix these two issues introducing polymorphic subscripting,
in which each subscript expression variant interpreted depending on the result
of previous subscripting step. There are two variants of jsonb subscript
expressions -- the first is casted to text and the second is casted to int4.
Executor at each subscripting step selects which variant to execute by calling
callback jsonb_subscript_selectexpr(). To manage the subscripting state,
another callback jsonb_subscript_step() was introduced along with the new field
SubscriptingRefState.privatedata.

Such expression selecting has noticeable overhead, which we can eliminate by
generating only one expression variant when subscript is of int2/int4 or
text type.

After float subscripts start to works as expected:

SELECT ('[1, 2, 3]'::jsonb)[1.2];
jsonb
-------
2
(1 row)

SELECT ('{"1": "a", "1.0": "b", "1.2": "c"}'::jsonb)[i]
FROM unnest('{1,1.0,1.2}'::numeric[]) i;

jsonb
-------
"a"
"b"
"c"
(3 rows)

Performance was compared on 4 tables with 10M rows:

-- [ i ]
CREATE TABLE arr_1 AS
SELECT jsonb_build_array(i)::jsonb js
FROM generate_series(1, 10000000) i;

-- { "a": i }
CREATE TABLE obj_1 AS
SELECT jsonb_build_object('a', i) js
FROM generate_series(1, 10000000) i;

-- [[[[[[[[[[ i ]]]]]]]]]]
CREATE TABLE arr_10 AS
SELECT (repeat('[', 10) || i || repeat(']', 10))::jsonb js
FROM generate_series(1, 10000000) i;

-- {"a": {"a": ... {"a": {"a": i } } ... } }
CREATE TABLE obj_10 AS
SELECT (repeat('{"a":', 10) || i || repeat('}', 10))::jsonb js
FROM generate_series(1, 10000000) i;

Queries were like "SELECT FROM table WHERE expression IS [NOT] NULL".

Compared previous v27 version (4 patches) with v28 version (5 patches).
New patch #5 contains one small but important optimization -- elimination of
unnecessary palloc() in getIthJsonbValueFromContainer() and jsonb_get_element().
It should be posted separately, but for simplicity I included it the patch now.
For the correctness of comparison, it was evaluated separately on top of v27
(v27opt).

Table | Expression | Query time, ms
| | v27 | v27opt| v28
--------+--------------------------------------+-------+-------+-------
arr_1 | js->0 | 1811 | 1809 | 1813
arr_1 | js[0] | 2273 | 2294 | 2028
arr_1 | js['0'] | 2276 | 2286 | 2339
arr_1 | js->1 | 808 | 844 | 809
arr_1 | js[1] | 1180 | 1187 | 1008

obj_1 | js->'a' | 1941 | 1935 | 1939
obj_1 | js['a'] | 2079 | 2083 | 2102
obj_1 | js->'b' | 917 | 915 | 902
obj_1 | js['b'] | 960 | 961 | 1059
|
arr_10 | js->0->0 ... ->0->0 | 4530 | 4068 | 4052
arr_10 | js[0][0] ... [0][0] | 6197 | 5513 | 3766
arr_10 | js['0']['0'] ... ['0']['0'] | 6202 | 5519 | 5983
arr_10 | js #> '{0,0,0,0,0,0,0,0,0,0}' | 6412 | 5850 | 5835
arr_10 | js #>> '{0,0,0,0,0,0,0,0,0,0}' | 5904 | 5181 | 5192

obj_10 | js->'a'->'a' ... ->'a'->'a' | 4970 | 4717 | 4704
obj_10 | js['a']['a'] ... ['a']['a'] | 4331 | 3698 | 4032
obj_10 | js #> '{a,a,a,a,a,a,a,a,a,a}' | 4570 | 3941 | 3949
obj_10 | js #>> '{a,a,a,a,a,a,a,a,a,a}' | 4055 | 3395 | 3392

As it can be seen, array access time reduced from 10% in single subscripts
to 40% in 10-subscript chains, and subscripting event started to overtake
chained "->" operators. But there is 10% slowdown of object key access that
needs further investigation. The elimination of unnecessary palloc()s also
gives good results.

I had to write new assignment logic reusing only some parts of setPath(),
because the loop in setPath() should be broken on every level. During this
process, I decided to implement assignment behavior similar to PostgreSQL's
array behavior and added two new features:
- creation of jsonb arrays/objects container from NULL values
- appending/prepending array elements on the specified position, gaps filled
with nulls (JavaScript has similar behavior)
These features are not so easy to extract into a separate patch on top of the
first 4 patches, but I can try if necessary.

Here is examples of new features:

CREATE TABLE t AS SELECT NULL::jsonb js, NULL::int[] a;

-- create array from NULL
UPDATE t SET js[0] = 1, a[1] = 1;
SELECT * FROM t;
js | a
-----+-----
[1] | {1}
(1 row)

-- append 4th element
UPDATE t SET js[3] = 4, a[4] = 4;
SELECT * FROM t;
js | a
--------------------+-----------------
[1, null, null, 4] | {1,NULL,NULL,4}
(1 row)

-- prepend element when index is negative (position = size + index)
UPDATE t SET js[-6] = -2;
SELECT js FROM t;
js
------------------------------
[-2, null, 1, null, null, 4]
(1 row)

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v28-0001-Base-implementation-of-subscripting-mechanism.patch text/x-patch 48.2 KB
v28-0002-Subscripting-for-array.patch text/x-patch 24.7 KB
v28-0003-Subscripting-for-jsonb.patch text/x-patch 33.3 KB
v28-0004-Subscripting-documentation.patch text/x-patch 19.9 KB
v28-0005-Polymorphic-subscripting.patch text/x-patch 62.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-10-31 02:54:17 Re: Allow cluster_name in log_line_prefix
Previous Message Fujii Masao 2019-10-31 02:27:12 Allow CREATE OR REPLACE VIEW to rename the columns