Re: SQL/JSON: functions

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
Cc: Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Subject: Re: SQL/JSON: functions
Date: 2018-06-28 04:25:12
Message-ID: CAFj8pRCQdkhzNj+hK37hiTcbCrCikPri4DutuYi2Tan-5nCiVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2018-06-28 2:18 GMT+02:00 Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>:

> On 15.03.2018 20:04, Nikita Glukhov wrote:
>
>> Attached 13th version of the patches:
>>
>> * Subtransactions in PG_TRY/CATCH in ExecEvalJsonExpr() were made
>> unconditional,
>> regardless of the volatility of expressions.
>>
>> * PG_TRY/CATCH in ExecEvalExprPassingCaseValue() was removed along with
>> the
>> entire function.
>>
>
>
> Attached 15th version of the patches:
> * disabled parallel execution of SQL/JSON query functions when internal
> subtransactions are used (if ERROR ON ERROR is not specified)
> * added experimental optimization of internal subtransactions (see below)
>
>
> The new patch #14 is an experimental attempt to reduce overhead of
> subtransaction start/commit which can result in 2x-slowdown in the simplest
> cases. By the idea of Alexander Korotkov, subtransaction is not really
> committed if it has not touched the database and its XID has not been
> assigned
> (DB modification is not expected in type casts functions) and then can be
> reused
> when the next subtransaction is started. So, all rows in JsonExpr can be
> executed in the single cached subtransaction. This optimization really
> helps
> to reduce overhead from 100% to 5-10%:
>

I read a technical report for SQL/JSON. If I understand it well, then ON
ERROR clause is primary related to structural errors, not to all errors.

So your implementation is maybe too tolerant, what has this issue. There
was not any example, so this clause should to handle cast errors or any
other errors than JSON structural.

The playing with other implementation of subtransactions doesn't look like
safe way, more if it is not necessary

The other possible error are casts errors. We can introduce new exception
safe input functions. These functions can be interesting for fault tolerant
COPY for example.

Regards

Pavel

> -- without subtransactions
> =# EXPLAIN ANALYZE
> SELECT JSON_VALUE('true'::jsonb, '$' RETURNING boolean ERROR ON ERROR)
> FROM generate_series(1, 10000000) i;
> ...
> Execution Time: 2785.410 ms
>
> -- cached subtransactions
> =# EXPLAIN ANALYZE
> SELECT JSON_VALUE('true'::jsonb, '$' RETURNING boolean)
> FROM generate_series(1, 10000000) i;
> ...
> Execution Time: 2939.363 ms
>
> -- ordinary subtransactions
> =# EXPLAIN ANALYZE
> SELECT JSON_VALUE('true'::jsonb, '$' RETURNING boolean)
> FROM generate_series(1, 10000000) i;
> ...
> Execution Time: 5417.268 ms
>
>
> But, unfortunately, I don't believe that this patch is completely correct,
> mainly because the behavior of subtransaction callbacks (and their
> expectations
> about subtransaction's lifecycle too) seems unpredictable to me.
>
>
>
> Even with this optimization, internal subtransactions still have one major
> drawback -- they disallow parallel query execution, because background
> workers do not support subtransactions now. Example:
>
> =# CREATE TABLE test_parallel_json_value AS
> SELECT i::text::jsonb AS js FROM generate_series(1, 5000000) i;
> CREATE TABLE
>
> =# EXPLAIN ANALYZE
> SELECT sum(JSON_VALUE(js, '$' RETURNING numeric ERROR ON ERROR))
> FROM test_parallel_json_value;
> QUERY PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------
> Finalize Aggregate (cost=79723.15..79723.16 rows=1 width=32) (actual
> time=455.062..455.062 rows=1 loops=1)
> -> Gather (cost=79722.93..79723.14 rows=2 width=32) (actual
> time=455.052..455.055 rows=3 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Partial Aggregate (cost=78722.93..78722.94 rows=1 width=32)
> (actual time=446.000..446.000 rows=1 loops=3)
> -> Parallel Seq Scan on t (cost=0.00..52681.30
> rows=2083330 width=18) (actual time=0.023..104.779 rows=1666667 loops=3)
> Planning Time: 0.044 ms
> Execution Time: 456.460 ms
> (8 rows)
>
> =# EXPLAIN ANALYZE
> SELECT sum(JSON_VALUE(js, '$' RETURNING numeric))
> FROM test_parallel_json_value;
> QUERY PLAN
> ------------------------------------------------------------
> --------------------------------------------------------
> Aggregate (cost=144347.82..144347.83 rows=1 width=32) (actual
> time=1381.938..1381.938 rows=1 loops=1)
> -> Seq Scan on t (cost=0.00..81847.92 rows=4999992 width=18) (actual
> time=0.076..309.676 rows=5000000 loops=1)
> Planning Time: 0.082 ms
> Execution Time: 1384.133 ms
> (4 rows)
>
>
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Charles Cui 2018-06-28 05:05:47 [GSoC] thrift compact protocol byte interface get implemented
Previous Message Michael Paquier 2018-06-28 03:43:41 Re: partition tree inspection functions