Re: sql/json remaining issue

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sql/json remaining issue
Date: 2024-04-09 11:37:29
Message-ID: CA+HiwqESEo1smZ_TSO5vXMjoxWb4Hiw95F2bm1=PDd-LtY=Kug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Apr 9, 2024 at 4:47 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> hi.
> `
> | NESTED [ PATH ] json_path_specification [ AS json_path_name ]
> COLUMNS ( json_table_column [, ...] )
> NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS
> ( json_table_column [, ...] )
> `
> "json_path_specification" should be "path_expression"?
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> drop table s1;
> create or replace function random_text_1000() returns text
> as $$select string_agg(md5(random()::text),'') from
> generate_Series(1,1000) s $$ LANGUAGE SQL;
>
> create unlogged table s1(a int GENERATED BY DEFAULT AS IDENTITY, js jsonb);
> insert into s1(js)
> select jsonb ('{"a":{"za":[{"z1": [11,2222]},{"z21": [22, 234,' || g
> || ']},{"z22": [32, 204,145]}]},"c": ' || g
> || ',"id": "' || random_text_1000() || '"}')
> from generate_series(1_000_000, 1_000_000) g;
> insert into s1(js)
> select jsonb ('{"a":{"za":[{"z1": [11,2222]},{"z21": [22, 234,' || g
> || ']},{"z22": [32, 204,145]}]},"c": ' || g
> || ',"id": "' || random_text_1000() || '"}')
> from generate_series(235, 235 + 200000,1) g;
>
> select count(*), pg_size_pretty(pg_total_relation_size('s1')) from s1;
> count | pg_size_pretty
> --------+----------------
> 200002 | 6398 MB
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> explain(analyze, costs off,buffers, timing)
> SELECT sub.*, s.a as s_a FROM s,
> (values(23)) x(x),
> generate_series(13, 13) y,
> JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
> COLUMNS (
> xx1 int PATH '$.c',
> NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
> (c int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
> NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
> -2))' as z1 COLUMNS (a int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS
> (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
> PATH '$ ? (@ <= ($"x" + 999976))' default -1000 ON EMPTY))
> )) sub;
>
> for one jsonb, it can expand to 7 rows, the above query will return
> around 1.4 million rows.
> i use the above query, and pg_log_backend_memory_contexts in another
> session to check the memory usage.
> didn't find memory over consumed issue.
>
> but I am not sure this is the right way to check the memory consumption.
> ----------------------------------------------------------------------------------------------------------------------
> begin;
> SELECT sub.*, s.a as s_a FROM s,
> (values(23)) x(x),
> generate_series(13, 13) y,
> JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
> COLUMNS (
> xx1 int PATH '$.c',
> NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
> (c int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
> NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
> -2))' as z1 COLUMNS (a int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS
> (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
> PATH '$ ? (@ <= ($"x" + 999976))' error ON EMPTY))
> )) sub;
> rollback;
>
> only the last row will fail, because of "error ON EMPTY", (1_000_000
> <= 23 + 999976) is false.
> I remember the very previous patch, because of error cleanup, it took
> a lot of resources.
> does our current implementation, only the very last row fail, will it
> be easy to clean up the transaction?

I am not sure I understand your concern. Could you please rephrase
it? Which previous patch are you referring to and what problem did it
cause with respect to error cleanup?

Per-row memory allocated for each successful output row JSON_TABLE()
doesn't pile up, because it's allocated in a context that is reset
after evaluating each row; see tfuncLoadRows(). But again I may be
misunderstanding your concern.

> the last query error message is:
> `
> ERROR: no SQL/JSON item
> `
>
> we are in ExecEvalJsonExprPath, can we output it to be:
> `
> ERROR: after applying json_path "5s", no SQL/JSON item found
> `
> in a json_table query, we can have multiple path_expressions, like the
> above query.
> it's not easy to know applying which path_expression failed.

Hmm, I'm not so sure about mentioning the details of the path because
path names are optional and printing path expression itself is not a
good idea. Perhaps, we could mention the column name which would
always be there, but we'd then need to add a new field column_name
that's optionally set to JsonFuncExpr and JsonExpr, that is, when they
are being set up for JSON_TABLE() columns. As shown in the attached.
With the patch you'll get:

ERROR: no SQL/JSON item found for column "b"

--
Thanks, Amit Langote

Attachment Content-Type Size
v1-0001-JSON_TABLE-mention-column-name-in-the-ON-EMPTY-er.patch application/octet-stream 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ole Peder Brandtzæg 2024-04-09 11:41:15 Re: NumericShort vs NumericLong format
Previous Message Tomas Vondra 2024-04-09 11:24:06 Re: post-freeze damage control