Re: jsonb_set() strictness considered harmful to data

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Ariadne Conill <ariadne(at)dereferenced(dot)org>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Christoph Moench-Tegeder <cmt(at)burggraben(dot)net>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, feld(at)freebsd(dot)org
Subject: Re: jsonb_set() strictness considered harmful to data
Date: 2019-10-19 18:28:47
Message-ID: 39ed932e-7895-740b-745a-0e1ec93ae1b0@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 10/18/19 7:18 PM, Ariadne Conill wrote:
> Hello,
>
> On Fri, Oct 18, 2019 at 7:04 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
>>
>> On 10/18/19 4:31 PM, Ariadne Conill wrote:
>>> Hello,
>>>
>>> On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
>>>>
>>>> On 10/18/19 3:11 PM, Ariadne Conill wrote:
>>>>> Hello,
>>>>>
>>>>> On Fri, Oct 18, 2019 at 5:01 PM David G. Johnston
>>>>> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>>>>>>
>>>>>> On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <cmt(at)burggraben(dot)net> wrote:
>>>>>>>
>>>>>>> ## Ariadne Conill (ariadne(at)dereferenced(dot)org):
>>>>>>>
>>>>>>>> update users set info=jsonb_set(info, '{bar}', info->'foo');
>>>>>>>>
>>>>>>>> Typically, this works nicely, except for cases where evaluating
>>>>>>>> info->'foo' results in an SQL null being returned. When that happens,
>>>>>>>> jsonb_set() returns an SQL null, which then results in data loss.[3]
>>>>>>>
>>>>>>> So why don't you use the facilities of SQL to make sure to only
>>>>>>> touch the rows which match the prerequisites?
>>>>>>>
>>>>>>> UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
>>>>>>> WHERE info->'foo' IS NOT NULL;
>>>>>>>
>>>>>>
>>>>>> There are many ways to add code to queries to make working with this function safer - though using them presupposes one remembers at the time of writing the query that there is danger and caveats in using this function. I agree that we should have (and now) provided sane defined behavior when one of the inputs to the function is null instead blowing off the issue and defining the function as being strict. Whether that is "ignore and return the original object" or "add the key with a json null scalar value" is debatable but either is considerably more useful than returning SQL NULL.
>>>>>
>>>>> A great example of how we got burned by this last year: Pleroma
>>>>> maintains pre-computed counters in JSONB for various types of
>>>>> activities (posts, followers, followings). Last year, another counter
>>>>> was added, with a migration. But some people did not run the
>>>>> migration, because they are users, and that's what users do. This
>>>>
>>>> So you are more forgiving of your misstep, allowing users to run
>>>> outdated code, then of running afoul of Postgres documented behavior:
>>>
>>> I'm not forgiving of either.
>>>
>>>> https://www.postgresql.org/docs/11/functions-json.html
>>>> " The field/element/path extraction operators return NULL, rather than
>>>> failing, if the JSON input does not have the right structure to match
>>>> the request; for example if no such element exists"
>>>
>>> It is known that the extraction operators return NULL. The problem
>>> here is jsonb_set() returning NULL when it encounters SQL NULL.
>>
>> I'm not following. Your original case was:
>>
>> jsonb_set(info, '{bar}', info->'foo');
>>
>> where info->'foo' is equivalent to:
>>
>> test=# select '{"f1":1,"f2":null}'::jsonb ->'f3';
>> ?column?
>> ----------
>> NULL
>>
>> So you know there is a possibility that a value extraction could return
>> NULL and from your wrapper that COALESCE is the way to deal with this.
>
> You're not following because you don't want to follow.
>
> It does not matter that info->'foo' is in my example. That's not what
> I am talking about.
>
> What I am talking about is that jsonb_set(..., ..., NULL) returns SQL NULL >
> postgres=# \pset null '(null)'
> Null display is "(null)".
> postgres=# select jsonb_set('{"a":1,"b":2,"c":3}'::jsonb, '{a}', NULL);
> jsonb_set
> -----------
> (null)
> (1 row)
>
> This behaviour is basically giving an application developer a loaded
> shotgun and pointing it at their feet. It is not a good design. It
> is a design which has likely lead to many users experiencing
> unintentional data loss.

create table null_test(fld_1 integer, fld_2 integer);

insert into null_test values(1, 2), (3, NULL);

select * from null_test ;
fld_1 | fld_2
-------+-------
1 | 2
3 | NULL
(2 rows)

update null_test set fld_1 = fld_1 + fld_2;

select * from null_test ;
fld_1 | fld_2
-------+-------
3 | 2
NULL | NULL

Failure to account for NULL is a generic issue. Given that this only the
second post I can find that deals with this, in going on 4 years, I am
guessing most users have dealt with it. If you really think this raises
to the level of a bug then I would suggest filing a report here:

https://www.postgresql.org/account/login/?next=/account/submitbug/

>
> Ariadne
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2019-10-19 19:06:42 Re: Postgres Point in time Recovery (PITR),
Previous Message Daulat Ram 2019-10-19 17:46:15 RE: Postgres Point in time Recovery (PITR),

Browse pgsql-hackers by date

  From Date Subject
Next Message Isaac Morland 2019-10-19 18:35:42 Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?
Previous Message Stephen Frost 2019-10-19 17:36:25 Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?