| From: | Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> | 
|---|---|
| To: | PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Optimization of some jsonb functions | 
| Date: | 2019-02-22 00:05:33 | 
| Message-ID: | 7c417f90-f95f-247e-ba63-d95e39c0ad14@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Attached set of patches with some jsonb optimizations that were made during
comparison of performance of ordinal jsonb operators and jsonpath operators.
1. Optimize JsonbExtractScalar():
    It is better to use getIthJsonbValueFromContainer(cont, 0) instead of
    JsonIterator to get 0th element of raw-scalar pseudoarray.
    JsonbExtractScalar() is used in jsonb casts, so they speed up a bit.
2. Optimize operator #>>, jsonb_each_text(), jsonb_array_elements_text():
    These functions have direct conversion (JsonbValue => text) only for
    jbvString scalars, but indirect conversion of other types of scalars
    (JsonbValue => jsonb => text) is obviously too slow.  Extracted common
    subroutine JsonbValueAsText() and used in all suitable places.
3. Optimize JsonbContainer type recognition in get_jsonb_path_all():
    Fetching of the first token from JsonbIterator is replaced with lightweight
    JsonbContainerIsXxx() macros.
4. Extract findJsonbKeyInObject():
    Extracted findJsonbKeyInObject() from findJsonbValueFromContainer(),
    which is slightly easier to use (key string and its length is passed instead
    of filled string JsonbValue).
5. Optimize resulting value allocation in findJsonbValueFromContainer() and
    getIthJsonbValueFromContainer():
    Added ability to pass stack-allocated JsonbValue that will be filled with
    the result of operation instead of returning unconditionally palloc()ated
    JsonbValue.
Patches #4 and #5 are mostly refactorings, but they can give small speedup
(up to 5% for upcoming jsonpath operators) due to elimination of unnecessary
palloc()s.  The whole interface of findJsonbValueFromContainer() with JB_OBJECT
and JB_ARRAY flags always seemed a bit strange to me, so I think it is worth to
have separate functions for searching keys in objects and elements in arrays.
Performance tests:
  - Test data for {"x": {"y": {"z": i}}}:
    CREATE TABLE t AS
    SELECT jsonb_build_object('x',
             jsonb_build_object('y',
               jsonb_build_object('z', i))) js
    FROM generate_series(1, 3000000) i;
  - Sample query:
    EXPLAIN (ANALYZE) SELECT js -> 'x' -> 'y' -> 'z' FROM t;
  - Results:
                                                     |   execution time, ms
                          query                      |  master  |   optimized
-------------------------------------------------------------------------------
               {"x": {"y": {"z": i}}}
  js #> '{x,y,z}'                                    | 1148.632 | 1005.578 -10%
  js #>> '{x,y,z}'                                   | 1520.160 |  849.991 -40%
  (js #> '{x,y,z}')::numeric                         | 1310.881 | 1067.752 -20%
  (js #>> '{x,y,z}')::numeric                        | 1757.179 | 1109.495 -30%
  js -> 'x' -> 'y' -> 'z'                            | 1030.211 |  977.267
  js -> 'x' -> 'y' ->> 'z'                           |  887.101 |  838.745
  (js -> 'x' -> 'y' -> 'z')::numeric                 | 1184.086 | 1050.462
  (js -> 'x' -> 'y' -> 'z')::int4                    | 1279.315 | 1133.032
  (js -> 'x' -> 'y' ->> 'z')::numeric                | 1134.003 | 1100.047
  (js -> 'x' -> 'y' ->> 'z')::int4                   | 1077.216 |  991.995
  js ? 'x'                                           |  523.111 |  495.387
  js ?| '{x,y,z}'                                    |  612.880 |  607.455
  js ?& '{x,y,z}'                                    |  674.786 |  643.987
  js -> 'x' -> 'y' ? 'z'                             |  712.623 |  698.588
  js @> '{"x": {"y": {"z": 1}}}'                     | 1154.926 | 1149.069
jsonpath:
  js @@ '$.x.y.z == 123'                             |  973,444 |   912,08  -5%
               {"x": i, "y": i, "z": i}
  jsonb_each(js)                                     | 2281.577 | 2262.660
  jsonb_each_text(js)                                | 2603.539 | 2112.200 -20%
  
                    [i, i, i]
  jsonb_array_elements(js)                           | 1255.210 | 1205.939
  jsonb_array_elements(js)::numeric                  | 1662.550 | 1576.227  -5%
  jsonb_array_elements_text(js)                      | 1555.021 | 1067.031 -30%
  js @> '1'                                          |  798.858 |  768.664  -4%
  js <@ '[1,2,3]'                                    |  820.795 |  785.086  -5%
  js <@ '[0,1,2,3,4,5,6,7,8,9]'                      | 1214.170 | 1165.289  -5%
As it can be seen, #> operators are always slower than equivalent series of ->.
I think it is caused by array deconstruction in "jsonb #> text[]".
-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
| Attachment | Content-Type | Size | 
|---|---|---|
| 0001-Optimize-JsonbExtractScalar-v01.patch | text/x-patch | 1.6 KB | 
| 0002-Optimize-jsonb_xxx_text-functions-v01.patch | text/x-patch | 6.2 KB | 
| 0003-Optimize-JsonbContainer-type-recognition-v01.patch | text/x-patch | 1.3 KB | 
| 0004-Extract-findJsonbKeyInObject-v01.patch | text/x-patch | 10.7 KB | 
| 0005-Optimize-resulting-JsonbValue-allocation.patch | text/x-patch | 8.6 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2019-02-22 00:13:13 | Re: boolean and bool in documentation | 
| Previous Message | Stephen Frost | 2019-02-21 23:39:10 | Re: [PATCH v20] GSSAPI encryption support |