Re: Extract numeric filed in JSONB more effectively

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2024-02-09 09:05:21
Message-ID: 87r0hmvuvr.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

Here is the update of this patch.

1. What is it for?

commit f7b93acc24b4a152984048fefc6d71db606e3204 (HEAD -> jsonb_numeric)
Author: yizhi.fzh <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Fri Feb 9 16:54:06 2024 +0800

Improve the performance of Jsonb numeric/bool extraction.

JSONB object uses a binary compatible numeric format with the numeric
data type in SQL. However in the past, extracting a numeric value from a
JSONB field still needs to find the corresponding JsonbValue first,
then convert the JsonbValue to Jsonb, and finally use the cast system to
convert the Jsonb to a Numeric data type. This approach was very
inefficient in terms of performance.

In the current patch, It is handled that the JsonbValue is converted to
numeric data type directly. This is done by the planner support
function which detects the above case and simplify the expression.
Because the boolean type and numeric type share certain similarities in
their attributes, we have implemented the same optimization approach for
both. In the ideal test case, the performance can be 2x than before.

The optimized functions and operators includes:
1. jsonb_object_field / ->
2. jsonb_array_element / ->
3. jsonb_extract_path / #>
4. jsonb_path_query
5. jsonb_path_query_first

example:
create table tb(a jsonb);
insert into tb select '{"a": 1, "b": "a"}'::jsonb;

master:
explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on public.tb
Output: a
Filter: (((tb.a -> 'a'::text))::numeric > '3'::numeric)
(3 rows)

patched:

postgres=# explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on public.tb
Output: a
Filter: (jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'a'::text), '1700'::oid) > '3'::numeric)
(3 rows)

The final expression generated by planner support function includes:

1).
jsonb_object_field_start((tb.a)::internal, 'a'::text) first, this
function returns the internal datum which is JsonbValue in fact.
2).
jsonb_finish_numeric(internal (jsonbvalue), '1700::oid) convert the
jsonbvalue to numeric directly without the jsonb as a intermedia result.

the reason why "1700::oid" will be explained later, that's the key issue
right now.

The reason why we need the 2 steps rather than 1 step is because the
code can be better abstracted, the idea comes from Chap, the detailed
explaination is at [1]. You can search "Now, it would make me happy to
further reduce some of the code duplication" and read the following
graph.

2. Where is the current feature blocked for the past few months?

It's error message compatible issue! Continue with above setup:

master:

select * from tb where (a->'b')::numeric > 3::numeric;
ERROR: cannot cast jsonb string to type numeric

select * from tb where (a->'b')::int4 > 3::numeric;
ERROR: cannot cast jsonb string to type integer

You can see the error message is different (numeric vs integer).

Patched:

We still can get the same error message as master BUT the code
looks odd.

select * from tb where (a->'b')::int4 > 3;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on public.tb
Output: a
Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3)
(3 rows)

You can see "jsonb_finish_numeric(.., '23::oid)" the '23::oid' is just
for the *"integer"* output in error message:

"cannot cast jsonb string to type *integer*"

Now the sistuation is either we use the odd argument (23::oid) in
jsonb_finish_numeric, or we use a incompatible error message with the
previous version. I'm not sure which way is better, but this is the
place the current feature is blocked.

3. what do I want now?

Since this feature uses the planner support function which needs some
catalog changes, so it is better that we can merge this feature in PG17,
or else, we have to target it in PG18. So if some senior developers can
chime in, for the current blocking issue at least, will be pretty
helpful.

[1]
https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net

--
Best Regards
Andy Fan

Attachment Content-Type Size
v16-0001-Improve-the-performance-of-Jsonb-numeric-bool-ex.patch text/x-diff 38.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthonin Bonnefoy 2024-02-09 09:09:43 Fix parallel vacuum buffer usage reporting
Previous Message Ashutosh Bapat 2024-02-09 08:57:26 Re: Printing backtrace of postgres processes