Re: jsonb_set() strictness considered harmful to data

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ariadne Conill <ariadne(at)dereferenced(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Christoph Moench-Tegeder <cmt(at)burggraben(dot)net>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, feld(at)freebsd(dot)org
Subject: Re: jsonb_set() strictness considered harmful to data
Date: 2019-10-19 11:08:31
Message-ID: 20191019110831.7y7d2yys5xlujqw6@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, Oct 18, 2019 at 09:14:09PM -0500, Ariadne Conill wrote:
>Hello,
>
>On Fri, Oct 18, 2019 at 6:52 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>
>> Greetings,
>>
>> * Ariadne Conill (ariadne(at)dereferenced(dot)org) wrote:
>> > On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
>> > > https://www.postgresql.org/docs/11/functions-json.html
>> > > " The field/element/path extraction operators return NULL, rather than
>> > > failing, if the JSON input does not have the right structure to match
>> > > the request; for example if no such element exists"
>> >
>> > It is known that the extraction operators return NULL. The problem
>> > here is jsonb_set() returning NULL when it encounters SQL NULL.
>> >
>> > > Just trying to figure why one is worse then the other.
>> >
>> > Any time a user loses data, it is worse. The preference for not
>> > having data loss is why Pleroma uses PostgreSQL as it's database of
>> > choice, as PostgreSQL has traditionally valued durability. If we
>> > should not use PostgreSQL, just say so.
>>
>> Your contention that the documented, clear, and easily addressed
>> behavior of a particular strict function equates to "the database system
>> loses data and isn't durable" is really hurting your arguments here, not
>> helping it.
>>
>> The argument about how it's unintuitive and can cause application
>> developers to misuse the function (which is clearly an application bug,
>> but perhaps an understandable one if the function interface isn't
>> intuitive or is confusing) is a reasonable one and might be convincing
>> enough to result in a change here.
>>
>> I'd suggest sticking to the latter argument when making this case.
>>
>> > > > I believe that anything that can be catastrophically broken by users
>> > > > not following upgrade instructions precisely is a serious problem, and
>> > > > can lead to serious problems. I am sure that this is not the only
>> > > > project using JSONB which have had users destroy their own data in
>> > > > such a completely preventable fashion.
>>
>> Let's be clear here that the issue with the upgrade instructions was
>> that the user didn't follow your *application's* upgrade instructions,
>> and your later code wasn't written to use the function, as documented,
>> properly- this isn't a case of PG destroying your data. It's fine to
>> contend that the interface sucks and that we should change it, but the
>> argument that PG is eating data because the application sent a query to
>> the database telling it, based on our documentation, to eat the data,
>> isn't appropriate. Again, let's have a reasonable discussion here about
>> if it makes sense to make a change here because the interface isn't
>> intuitive and doesn't match what other systems do (I'm guessing it isn't
>> in the SQL standard either, so we unfortunately can't look to that for
>> help; though I'd hardly be surprised if they supported what PG does
>> today anyway).
>
>Okay, I will admit that saying PG is eating data is perhaps
>hyperbolic,

My experience is that using such hyperbole is pretty detrimental, even
when one is trying to make a pretty sensible case. The problem is that
people often respond in a similarly hyperbolic claims, particularly when
you hit a nerve. And that's exactly what happened here, becase we're
*extremely* sensitive about data corruption issues, so when you claim
PostgreSQL is "eating data" people are likely to jump on you, beating
you with the documentation stick. It's unfortunate, but it's also
entirely predictable.

>but I will also say that the behaviour of jsonb_set()
>under this type of edge case is unintuitive and frequently results in
>unintended data loss. So, while PostgreSQL is not actually eating the
>data, it is putting the user in a position where they may suffer data
>loss if they are not extremely careful.
>
>Here is how other implementations handle this case:
>
>MySQL/MariaDB:
>
>select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
> {"a":null,"b":2,"c":3}
>
>Microsoft SQL Server:
>
>select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
> {"b":2,"c":3}
>
>Both of these outcomes make sense, given the nature of JSON objects.
>I am actually more in favor of what MSSQL does however, I think that
>makes the most sense of all.
>

I do mostly agree with this. The json[b]_set behavior seems rather
surprising, and I think I've seen a couple of cases running into exactly
this issue. I've solved that with a simple CASE, but maybe changing the
behavior would be better. That's unlikely to be back-patchable, though,
so maybe a better option is to create a non-strict wrappers. But that
does not work when the user is unaware of the behavior :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2019-10-19 11:32:54 Re: Securing records using linux grou permissions
Previous Message Ariadne Conill 2019-10-19 06:52:11 Re: jsonb_set() strictness considered harmful to data

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-10-19 12:23:23 Re: Compressed pluggable storage experiments
Previous Message Tomas Vondra 2019-10-19 10:52:10 Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?