Re: Extract numeric filed in JSONB more effectively

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2023-08-03 09:47:33
Message-ID: CAFj8pRAO3oEiBaJJ9=HZp6CoP2ffbwSgrKkLKjPfYZwx9wOOuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

čt 3. 8. 2023 v 9:53 odesílatel Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> napsal:

> Hi Pavel:
>
> Thanks for the feedback.
>
> I don't like this solution because it is bloating operators and it is not
>> extra readable.
>>
>
> If we support it with cast, could we say we are bloating CAST? It is true
> that it is not extra readable, if so how about a->>'a' return text?
> Actually
> I can't guess any meaning of the existing jsonb operations without
> documentation.
>

yes, it can bloat CAST, but for usage we have already used syntax, and
these casts are cooked already:

(2023-08-03 11:04:51) postgres=# select castfunc::regprocedure from pg_cast
where castsource = 'jsonb'::regtype;
┌──────────────────┐
│ castfunc │
╞══════════════════╡
│ - │
│ bool(jsonb) │
│ "numeric"(jsonb) │
│ int2(jsonb) │
│ int4(jsonb) │
│ int8(jsonb) │
│ float4(jsonb) │
│ float8(jsonb) │
└──────────────────┘
(8 rows)

the operator ->> was a special case, the text type is special in postgres
as the most convertible type. And when you want to visualise a value or
display the value, you should convert value to text.

I can live with that because it is just one, but with your proposal opening
the doors for implementing tens of similar operators, I think it is bad.
Using ::target_type is common syntax and doesn't require reading
documentation.

More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.

Usage of cast is self documented.

> For completeness you should implement cast for date, int, boolean too.
>> Next, the same problem is with XML or hstore type (probably with any types
>> that are containers).
>>
>
> I am not sure completeness is a gold rule we should obey anytime,
> like we have some function like int24le to avoid the unnecessary
> cast, but we just avoid casting for special types for performance
> reason, but not for all. At the same time, `int2/int4/int8` doesn't
> have a binary compatibility type in jsonb. and the serialization
> /deserialization for boolean is pretty cheap.
>
> I didn't realize timetime types are binary compatible with SQL,
> so maybe we can have some similar optimization as well.
> (It is a pity that timestamp(tz) are not binary, or else we may
> just need one operator).
>
>
>>
>> I don't like the idea so using a special operator is 2x faster than
>> common syntax for casting. It is a signal, so there is a space for
>> optimization. Black magic with special operators is not user friendly for
>> relatively common problems.
>>
>
> I don't think "Black magic" is a proper word here, since it is not much
> different from ->> return a text. If you argue text can be cast to
> most-of-types, that would be a reason, but I doubt this difference
> should generate a "black magic".
>

I used the term black magic, because nobody without reading documentation
can find this operator. It is used just for this special case, and the
functionality is the same as using cast (only with different performance).

The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on context

Can be nice to use some like `EXTRACT(YEAR FROM val->'field')` instead
`EXTRACT(YEAR FROM (val->>'field')::date)`

>
>>
>> Maybe we can introduce some *internal operator* "extract to type", and in
>> rewrite stage we can the pattern (x->'field')::type transform to OP(x,
>> 'field', typid)
>>
>
> Not sure what the OP should be? If it is a function, what is the
> return value? It looks to me like it is hard to do in c language?
>

It should be internal structure - it can be similar like COALESCE or IS
operator

>
> After all, if we really care about the number of operators, I'm OK
> with just let users use the function directly, like
>
> jsonb_field_as_numeric(jsonb, 'filedname')
> jsonb_field_as_timestamp(jsonb, 'filedname');
> jsonb_field_as_timestamptz(jsonb, 'filedname');
> jsonb_field_as_date(jsonb, 'filedname');
>
> it can save an operator and sloves the readable issue.
>

I don't like it too much, but it is better than introduction new operator

We already have the jsonb_extract_path and jsonb_extract_path_text
function.

I can imagine to usage "anyelement" type too. some like
`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`

> --
> Best Regards
> Andy Fan
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-08-03 10:04:18 Re: Extract numeric filed in JSONB more effectively
Previous Message tender wang 2023-08-03 09:34:40 Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails