Re: SQL/JSON: JSON_TABLE

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Subject: Re: SQL/JSON: JSON_TABLE
Date: 2020-01-14 15:26:15
Message-ID: df16e6d8-4de6-f228-5109-60aefed5d180@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached 42th version of the patches rebased onto current master.

Changes from the previous version:
* added EXISTS PATH columns
* added DEFAULT clause for FORMAT JSON columns
* added implicit FORMAT JSON for columns of json[b], array and composite types

On 21.11.2019 19:51, Pavel Stehule wrote:
>
> čt 21. 11. 2019 v 17:31 odesílatel Nikita Glukhov
> <n(dot)gluhov(at)postgrespro(dot)ru <mailto:n(dot)gluhov(at)postgrespro(dot)ru>> napsal:
>
> On 17.11.2019 13:35, Pavel Stehule wrote:
>
> I found:
>> a) Oracle & MySQL (Oracle) supports EXISTS clause, this implementation not.
>> I think should be useful support this clause too.
>> SELECT * FROM JSON_TABLE('...', '...' COLUMNS x INT EXISTS PATH ...
>>
> EXISTS PATH clause can be emulated with jsonpath EXISTS() predicate:
>
> =# SELECT *
> FROM JSON_TABLE('{"a": 1}', '$'
> COLUMNS (
> a bool PATH 'exists($.a)',
> b bool PATH 'exists($.b)'
> ));
> a | b
> ---+---
> t | f
> (1 row)
>
> But this works as expected only in lax mode. In strict mode EXISTS() returns
> Unknown that transformed into SQL NULL:
>
> =# SELECT *
> FROM JSON_TABLE('{"a": 1}', '$'
> COLUMNS (
> a bool PATH 'strict exists($.a)',
> b bool PATH 'strict exists($.b)'
> ));
> a | b
> ---+---
> t |
> (1 row)
>
> There is no easy way to return false without external COALESCE(),
> DEFAULT false ON ERROR also does not help.
>
> So, I think it's worth to add EXISTS PATH clause to our implementation.
>
>> There is a question how to map boolean result to other data types.
>
> Now, boolean result can be used in JSON_TABLE columns of bool, int4, text,
> json[b], and other types which have CAST from bool:
>
> SELECT *
> FROM JSON_TABLE('{"a": 1}', '$'
> COLUMNS (
> a int PATH 'exists($.a)',
> b text PATH 'exists($.b)'
> ));
> a | b
> ---+-------
> 1 | false
> (1 row)
>
EXISTS PATH columns were added. Only column types having CASTS
from boolean type are accepted.

Example:

SELECT *
FROM JSON_TABLE(
'{"foo": "bar"}', '$'
COLUMNS (
foo_exists boolean EXISTS PATH '$.foo',
foo int EXISTS,
err text EXISTS PATH '$ / 0' TRUE ON ERROR
)
);

foo_exists | foo | err
------------+-----+------
t | 1 | true
(1 row)

>> b) When searched value is not scalar, then it returns null. This behave can be
>> suppressed by clause FORMAT Json. I found a different behave, and maybe I found
>> a bug. On MySQL this clause is by default for JSON values (what has sense).
>> SELECT *
>> FROM
>>       JSON_TABLE(
>>         '[{"a":[1,2]}]',
>>         '$[*]'
>>         COLUMNS(
>>          aj JSON PATH '$.a' DEFAULT '{"x": 333}' ON EMPTY
>>         )
>>       ) AS tt;
>> It returns null, although it should to return [1,2].
>
> Yes, regular (non-formatted) JSON_TABLE columns can accept only scalar values.
> Otherwise an error is thrown, which can be caught by ON ERROR clause. This
> behavior is specified by the standard.
>
> FORMAT JSON is not implicitly added for json[b] columns now. The current SQL
> standard does not have any json data types, so I think we can add implicit
> FORMAT JSON for json[b] typed-columns. But I'm a bit afraid that different
> behavior can be standardized after introduction of json data types in SQL.
>
>> There is another bug maybe. Although there is DEFAULT clause. It returns NULL.
>
> ON ERROR should be used if "not a scalar" error needs to be caught:
>
> SELECT *
> FROM
> JSON_TABLE(
> '[{"a":[1,2]}]',
> '$[*]'
> COLUMNS(
> aj JSON PATH '$.a' DEFAULT '{"x": 333}' ON ERROR
> )
> ) AS tt;
>
> aj
> ------------
> {"x": 333}
> (1 row)
>
>
> ON EMPTY catches only empty-result case (for example, non-existent path in
> lax mode):
>
> SELECT *
> FROM
> JSON_TABLE(
> '[{"a":[1,2]}]',
> '$[*]'
> COLUMNS(
> aj JSON PATH '$.foo' DEFAULT '{"x": 333}' ON EMPTY
> )
> ) AS tt;
> aj
> ------------
> {"x": 333}
> (1 row)
>
>> I got correct result when I used FORMAT JSON clause.
>> I think it should be default behave for json and jsonb columns.
>
> I agree that FORMAT JSON could be implicit for json[b] columns. But I think
> there could be one minor problem if we want to verify that returned value is
> scalar.
>
> Without FORMAT JSON this is verified by the underlying JSON_VALUE expression:
>
> SELECT *
> FROM
> JSON_TABLE(
> '[{"a":[1,2]}]',
> '$[*]'
> COLUMNS (
> aj JSON PATH 'lax $.a' ERROR ON ERROR
> )
> ) AS tt;
> ERROR: JSON path expression in JSON_VALUE should return singleton scalar item
>
> (This error message with the reference to implicit JSON_VALUE needs to be fixed.)
>
>
> But with FORMAT JSON we need to construct complex jsonpath with a filter and
> override ON EMPTY behavior:
>
> SELECT *
> FROM
> JSON_TABLE(
> '[{"a":[1,2]}]',
> '$[*]'
> COLUMNS (
> aj JSON FORMAT JSON
> -- strict mode is mandatory to prevent array unwrapping
> PATH 'strict $.a ? (@.type() != "array" && @.type() != "object")'
> ERROR ON EMPTY ERROR ON ERROR
> )
> ) AS tt;
> ERROR: no SQL/JSON item
>
> please, check the behave of other databases. I think so good
> conformance with other RDBMS is important. More this method for
> checking if value is object or not looks little bit scary.
>
> maybe we can implement some functions like JSON_IS_OBJECT(),
> JSON_IS_ARRAY(), JSON_IS_VALUE()?
> More - we have this functionality already
>
> ostgres=# select json_typeof('[10,20]');
> ┌─────────────┐
> │ json_typeof │
> ╞═════════════╡
> │ array       │
> └─────────────┘
> (1 row)

Implicit FORMAT JSON is used for columns of json[b], array and composite types now.
The behavior is similar to behavior of json_populate_record().

Example:

CREATE TYPE test_record AS (foo text[], bar int);

SELECT *
FROM JSON_TABLE(
'{"foo": ["bar", 123, null]}', '$'
COLUMNS (
js json PATH '$',
jsonb_arr jsonb[] PATH '$.foo',
text_arr text[] PATH '$.foo',
int_arr int[] PATH '$.foo' DEFAULT '{}' ON ERROR,
rec test_record PATH '$'
)
);
js | jsonb_arr | text_arr | int_arr | rec
-----------------------------+----------------------+----------------+---------+---------------------
{"foo": ["bar", 123, null]} | {"\"bar\"",123,NULL} | {bar,123,NULL} | {} | ("{bar,123,NULL}",)
(1 row)

>> Another question - when I used FORMAT JSON clause, then I got syntax error
>> on DEFAULT keyword .. . Is it correct?
>>
>> Why I cannot to use together FORMAT JSON and DEFAULT clauses?
>
> JSON_TABLE columns with FORMAT JSON, like JSON_QUERY, can have only
> ERROR, NULL, EMPTY ARRAY, EMPTY OBJECT behaviors.
>
> This syntax is specified in the SQL standard:
>
> <JSON table formatted column definition> ::=
> <column name> <data type> FORMAT <JSON representation>
> [ PATH <JSON table column path specification> ]
> [ <JSON table formatted column wrapper behavior> WRAPPER ]
> [ <JSON table formatted column quotes behavior> QUOTES [ ON SCALAR STRING ] ]
> [ <JSON table formatted column empty behavior> ON EMPTY ]
> [ <JSON table formatted column error behavior> ON ERROR ]
>
> <JSON table formatted column empty behavior> ::=
> ERROR
> | NULL
> | EMPTY ARRAY
> | EMPTY OBJECT
>
> <JSON table formatted column error behavior> ::=
> ERROR
> | NULL
> | EMPTY ARRAY
> | EMPTY OBJECT
>
>
> But I also think that DEFAULT clause could be very useful in JSON_QUERY and
> formatted JSON_TABLE columns.
>

DEFAULT clause was enabled in JSON_QUERY() and formatted JSON_TABLE columns:

SELECT *
FROM JSON_TABLE(
'{"foo": "bar"}', '$'
COLUMNS (
baz json FORMAT JSON DEFAULT '"empty"' ON EMPTY
)
);
baz
---------
"empty"
(1 row)

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-SQL-JSON-functions-v42.patch.gz application/gzip 121.2 KB
0002-JSON_TABLE-v42.patch.gz application/gzip 27.5 KB
0003-JSON_TABLE-PLAN-DEFAULT-clause-v42.patch.gz application/gzip 7.1 KB
0004-JSON_TABLE-PLAN-clause-v42.patch.gz application/gzip 13.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-01-14 15:49:21 Re: Create/alter policy and exclusive table lock
Previous Message Daniel Gustafsson 2020-01-14 15:15:14 Re: Setting min/max TLS protocol in clientside libpq