Re: Extract numeric filed in JSONB more effectively

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2023-08-31 09:10:39
Message-ID: CAKU4AWpM=C5NxumoHQB1=id0OBc63kUgMUWyRNO2=OrHj6ROyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Chap,

The v11 attached, mainly changes are:
1. use the jsonb_xx_start and jsonb_finish_numeric style.
2. improve the test case a bit.

It doesn't include:
1. the jsonb_finish_text function, since we have a operator ->> for text
already and the performance for it is OK and there is no cast entry for
jsonb to text.
2. the jsonb_finish_jsonb since I can't see a clear user case for now.
Rewriting jsonb_object_field with 2 DirectFunctionCall looks not pretty
reasonable as we paid 2 DirectFunctionCall overhead to reduce ~10 lines
code duplication.

An incompatible issue at error message level is found during test:
create table jb(a jsonb);
insert into jb select '{"a": "a"}'::jsonb;
select (a->'a')::int4 from jb;

master: ERROR: cannot cast jsonb string to type *integer*
patch: ERROR: cannot cast jsonb string to type *numeric*

That's mainly because we first extract the field to numeric and
then cast it to int4 and the error raised at the first step and it
doesn't know the final type. One way to fix it is adding a 2nd
argument for jsonb_finish_numeric for the real type, but
it looks weird and more suggestions on this would be good.

Performance comparison between v10 and v11.

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;
select 1 from tb where (a->'a')::int2 = 2; (pgbench 5 times)

v11: 16.273 ms
v10: 15.986 ms
master: 32.530ms

So I think the performance would not be an issue.

> I noticed there is another patch registered in this CF: [1]
> It adds new operations within jsonpath like .bigint .time
> and so on.
>
> I was wondering whether that work would be conflicting or
> complementary with this. It looks to be complementary. The
> operations being added there are within jsonpath evaluation.
> Here we are working on faster ways to get those results out.
>
> It does not seem that [1] will add any new choices in
> JsonbValue. All of its (.bigint .integer .number) seem to
> verify the requested form and then put the result as a
> numeric in ->val.numeric. So that doesn't add any new
> cases for this patch to handle. (Too bad, in a way: if that
> other patch added ->val.bigint, this patch could add a case
> to retrieve that value without going through the work of
> making a numeric. But that would complicate other things
> touching JsonbValue, and be a matter for that other patch.)
>
> It may be expanding the choices for what we might one day
> find in ->val.datetime though.
>
> Thanks for this information. I tried the jsonb_xx_start and
jsonb_finish_numeric style, and it looks like a good experience
and it may not make things too complicated even if the above
things happen IMO.

Any feedback is welcome.

--
Best Regards
Andy Fan

Attachment Content-Type Size
v11-0001-optimize-casting-jsonb-to-a-given-type.patch application/octet-stream 23.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-08-31 09:13:00 pg_basebackup: Always return valid temporary slot names
Previous Message Jelte Fennema 2023-08-31 09:01:07 Re: Allow specifying a dbname in pg_basebackup connection string