Re: SQL/JSON: JSON_TABLE

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Subject: Re: SQL/JSON: JSON_TABLE
Date: 2020-01-14 19:55:37
Message-ID: CAFj8pRBaSA06w9WzG52Qu7r10MT2Qxd9Pzg0bbiZCCS_QKhgkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

I read this patch

There are some typo in doc

*name* *type* EXISTS [ PATH *json_path_specification* ]

*Gerenates* a column and inserts a boolean item into each row of this
column.

Is good to allow repeat examples from documentation - so documentation
should to contains a INSERT with JSON, query and result.

JSON_TABLE is pretty complex function, probably the most complex function
what I know, so I propose to divide documentation to two parts - basic
advanced. The basic should to coverage the work with missing or error
values (with examples), and explain what are wrappers. Advanced part should
to describe work with plans. I afraid so lot of smaller examples has to be
necessary. Personally I propose postpone 0003 and 0004 patches to some next
releases. This is extra functionality and not well used and documented in
other RDBMS (depends on your capacity) - there is problem only in well
documentation - because this feature is not almost used in projects, the
small differences from standard or other RDBMS can be fixed later (like we
fixed XMLTABLE last year).

The documentation is good enough for initial commit - but should be
significantly enhanced before release.

I did some small performance tests - and parsing json with result cca 25000
rows needs 150 ms. It is great time.

My previous objections was solved.

The patches was applied cleanly. The compilation is without any issues and
warnings.
There are enough regress tests, and check-world was passed without problem.
Source code is readable, and well formatted.

I checked standard and checked conformance with other RDBMS.

I will mark this patch - JSON_TABLE implementation as ready for commiter.
The documentation should be enhanced - more examples, more simple examples
are necessary.

Regards

Thank you for your great, complex and hard work

It will be great feature

Pavel

út 14. 1. 2020 v 16:26 odesílatel Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
napsal:

> Attached 42th version of the patches rebased onto current master.
>
>
> Changes from the previous version:
> * added EXISTS PATH columns
> * added DEFAULT clause for FORMAT JSON columns
> * added implicit FORMAT JSON for columns of json[b], array and composite types
>
>
> On 21.11.2019 19:51, Pavel Stehule wrote:
>
>
> čt 21. 11. 2019 v 17:31 odesílatel Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
> napsal:
>
>> On 17.11.2019 13:35, Pavel Stehule wrote:
>> I found:
>>
>> a) Oracle & MySQL (Oracle) supports EXISTS clause, this implementation not.
>> I think should be useful support this clause too.
>>
>> SELECT * FROM JSON_TABLE('...', '...' COLUMNS x INT EXISTS PATH ...
>>
>>
>> EXISTS PATH clause can be emulated with jsonpath EXISTS() predicate:
>>
>> =# SELECT *
>> FROM JSON_TABLE('{"a": 1}', '$'
>> COLUMNS (
>> a bool PATH 'exists($.a)',
>> b bool PATH 'exists($.b)'
>> ));
>> a | b
>> ---+---
>> t | f
>> (1 row)
>>
>> But this works as expected only in lax mode. In strict mode EXISTS() returns
>> Unknown that transformed into SQL NULL:
>>
>> =# SELECT *
>> FROM JSON_TABLE('{"a": 1}', '$'
>> COLUMNS (
>> a bool PATH 'strict exists($.a)',
>> b bool PATH 'strict exists($.b)'
>> ));
>> a | b
>> ---+---
>> t |
>> (1 row)
>>
>> There is no easy way to return false without external COALESCE(),
>> DEFAULT false ON ERROR also does not help.
>>
>> So, I think it's worth to add EXISTS PATH clause to our implementation.
>>
>>
>> There is a question how to map boolean result to other data types.
>>
>> Now, boolean result can be used in JSON_TABLE columns of bool, int4, text,
>> json[b], and other types which have CAST from bool:
>>
>> SELECT *
>> FROM JSON_TABLE('{"a": 1}', '$'
>> COLUMNS (
>> a int PATH 'exists($.a)',
>> b text PATH 'exists($.b)'
>> ));
>> a | b
>> ---+-------
>> 1 | false
>> (1 row)
>>
>> EXISTS PATH columns were added. Only column types having CASTS
> from boolean type are accepted.
>
> Example:
>
> SELECT *
> FROM JSON_TABLE(
> '{"foo": "bar"}', '$'
> COLUMNS (
> foo_exists boolean EXISTS PATH '$.foo',
> foo int EXISTS,
> err text EXISTS PATH '$ / 0' TRUE ON ERROR
> )
> );
>
> foo_exists | foo | err
> ------------+-----+------
> t | 1 | true
> (1 row)
>
>
>
> b) When searched value is not scalar, then it returns null. This behave can be
>> suppressed by clause FORMAT Json. I found a different behave, and maybe I found
>> a bug. On MySQL this clause is by default for JSON values (what has sense).
>>
>> SELECT *
>> FROM
>> JSON_TABLE(
>> '[{"a":[1,2]}]',
>> '$[*]'
>> COLUMNS(
>> aj JSON PATH '$.a' DEFAULT '{"x": 333}' ON EMPTY
>> )
>> ) AS tt;
>>
>> It returns null, although it should to return [1,2].
>>
>> Yes, regular (non-formatted) JSON_TABLE columns can accept only scalar values.
>> Otherwise an error is thrown, which can be caught by ON ERROR clause. This
>> behavior is specified by the standard.
>>
>> FORMAT JSON is not implicitly added for json[b] columns now. The current SQL
>> standard does not have any json data types, so I think we can add implicit
>> FORMAT JSON for json[b] typed-columns. But I'm a bit afraid that different
>> behavior can be standardized after introduction of json data types in SQL.
>>
>>
>> There is another bug maybe. Although there is DEFAULT clause. It returns NULL.
>>
>> ON ERROR should be used if "not a scalar" error needs to be caught:
>>
>> SELECT *
>> FROM
>> JSON_TABLE(
>> '[{"a":[1,2]}]',
>> '$[*]'
>> COLUMNS(
>> aj JSON PATH '$.a' DEFAULT '{"x": 333}' ON ERROR
>> )
>> ) AS tt;
>>
>> aj
>> ------------
>> {"x": 333}
>> (1 row)
>>
>>
>> ON EMPTY catches only empty-result case (for example, non-existent path in
>> lax mode):
>>
>> SELECT *
>> FROM
>> JSON_TABLE(
>> '[{"a":[1,2]}]',
>> '$[*]'
>> COLUMNS(
>> aj JSON PATH '$.foo' DEFAULT '{"x": 333}' ON EMPTY
>> )
>> ) AS tt;
>> aj
>> ------------
>> {"x": 333}
>> (1 row)
>>
>>
>> I got correct result when I used FORMAT JSON clause.
>> I think it should be default behave for json and jsonb columns.
>>
>> I agree that FORMAT JSON could be implicit for json[b] columns. But I think
>> there could be one minor problem if we want to verify that returned value is
>> scalar.
>>
>> Without FORMAT JSON this is verified by the underlying JSON_VALUE expression:
>>
>> SELECT *
>> FROM
>> JSON_TABLE(
>> '[{"a":[1,2]}]',
>> '$[*]'
>> COLUMNS (
>> aj JSON PATH 'lax $.a' ERROR ON ERROR
>> )
>> ) AS tt;
>> ERROR: JSON path expression in JSON_VALUE should return singleton scalar item
>>
>> (This error message with the reference to implicit JSON_VALUE needs to be fixed.)
>>
>>
>> But with FORMAT JSON we need to construct complex jsonpath with a filter and
>> override ON EMPTY behavior:
>>
>> SELECT *
>> FROM
>> JSON_TABLE(
>> '[{"a":[1,2]}]',
>> '$[*]'
>> COLUMNS (
>> aj JSON FORMAT JSON
>> -- strict mode is mandatory to prevent array unwrapping
>> PATH 'strict $.a ? (@.type() != "array" && @.type() != "object")'
>> ERROR ON EMPTY ERROR ON ERROR
>> )
>> ) AS tt;
>> ERROR: no SQL/JSON item
>>
>> please, check the behave of other databases. I think so good conformance
> with other RDBMS is important. More this method for checking if value is
> object or not looks little bit scary.
>
> maybe we can implement some functions like JSON_IS_OBJECT(),
> JSON_IS_ARRAY(), JSON_IS_VALUE()?
>
> More - we have this functionality already
>
> ostgres=# select json_typeof('[10,20]');
> ┌─────────────┐
> │ json_typeof │
> ╞═════════════╡
> │ array │
> └─────────────┘
> (1 row)
>
> Implicit FORMAT JSON is used for columns of json[b], array and composite types now.
> The behavior is similar to behavior of json_populate_record().
>
> Example:
>
> CREATE TYPE test_record AS (foo text[], bar int);
>
> SELECT *
> FROM JSON_TABLE(
> '{"foo": ["bar", 123, null]}', '$'
> COLUMNS (
> js json PATH '$',
> jsonb_arr jsonb[] PATH '$.foo',
> text_arr text[] PATH '$.foo',
> int_arr int[] PATH '$.foo' DEFAULT '{}' ON ERROR,
> rec test_record PATH '$'
> )
> );
> js | jsonb_arr | text_arr | int_arr | rec
> -----------------------------+----------------------+----------------+---------+---------------------
> {"foo": ["bar", 123, null]} | {"\"bar\"",123,NULL} | {bar,123,NULL} | {} | ("{bar,123,NULL}",)
> (1 row)
>
>
> Another question - when I used FORMAT JSON clause, then I got syntax error
>> on DEFAULT keyword .. . Is it correct?
>>
>> Why I cannot to use together FORMAT JSON and DEFAULT clauses?
>>
>> JSON_TABLE columns with FORMAT JSON, like JSON_QUERY, can have only
>> ERROR, NULL, EMPTY ARRAY, EMPTY OBJECT behaviors.
>>
>> This syntax is specified in the SQL standard:
>>
>> <JSON table formatted column definition> ::=
>> <column name> <data type> FORMAT <JSON representation>
>> [ PATH <JSON table column path specification> ]
>> [ <JSON table formatted column wrapper behavior> WRAPPER ]
>> [ <JSON table formatted column quotes behavior> QUOTES [ ON SCALAR STRING ] ]
>> [ <JSON table formatted column empty behavior> ON EMPTY ]
>> [ <JSON table formatted column error behavior> ON ERROR ]
>>
>> <JSON table formatted column empty behavior> ::=
>> ERROR
>> | NULL
>> | EMPTY ARRAY
>> | EMPTY OBJECT
>>
>> <JSON table formatted column error behavior> ::=
>> ERROR
>> | NULL
>> | EMPTY ARRAY
>> | EMPTY OBJECT
>>
>>
>> But I also think that DEFAULT clause could be very useful in JSON_QUERY and
>> formatted JSON_TABLE columns.
>>
>> DEFAULT clause was enabled in JSON_QUERY() and formatted JSON_TABLE columns:
>
> SELECT *
> FROM JSON_TABLE(
> '{"foo": "bar"}', '$'
> COLUMNS (
> baz json FORMAT JSON DEFAULT '"empty"' ON EMPTY
> )
> );
> baz
> ---------
> "empty"
> (1 row)
>
>
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-01-14 19:57:44 Re: Add pg_file_sync() to adminpack
Previous Message Fabien COELHO 2020-01-14 19:49:34 Re: doc: vacuum full, fillfactor, and "extra space"