Re: jsonb_set() strictness considered harmful to data

From: Ariadne Conill <ariadne(at)dereferenced(dot)org>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Christoph Moench-Tegeder <cmt(at)burggraben(dot)net>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, feld(at)freebsd(dot)org
Subject: Re: jsonb_set() strictness considered harmful to data
Date: 2019-10-19 02:14:09
Message-ID: CAAOiGNzsi75ycyYH-V2KtN5gcy+RT8C5dxvEEbUQGvOyZJ-2LA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hello,

On Fri, Oct 18, 2019 at 6:52 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> Greetings,
>
> * Ariadne Conill (ariadne(at)dereferenced(dot)org) wrote:
> > On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
> > > https://www.postgresql.org/docs/11/functions-json.html
> > > " The field/element/path extraction operators return NULL, rather than
> > > failing, if the JSON input does not have the right structure to match
> > > the request; for example if no such element exists"
> >
> > It is known that the extraction operators return NULL. The problem
> > here is jsonb_set() returning NULL when it encounters SQL NULL.
> >
> > > Just trying to figure why one is worse then the other.
> >
> > Any time a user loses data, it is worse. The preference for not
> > having data loss is why Pleroma uses PostgreSQL as it's database of
> > choice, as PostgreSQL has traditionally valued durability. If we
> > should not use PostgreSQL, just say so.
>
> Your contention that the documented, clear, and easily addressed
> behavior of a particular strict function equates to "the database system
> loses data and isn't durable" is really hurting your arguments here, not
> helping it.
>
> The argument about how it's unintuitive and can cause application
> developers to misuse the function (which is clearly an application bug,
> but perhaps an understandable one if the function interface isn't
> intuitive or is confusing) is a reasonable one and might be convincing
> enough to result in a change here.
>
> I'd suggest sticking to the latter argument when making this case.
>
> > > > I believe that anything that can be catastrophically broken by users
> > > > not following upgrade instructions precisely is a serious problem, and
> > > > can lead to serious problems. I am sure that this is not the only
> > > > project using JSONB which have had users destroy their own data in
> > > > such a completely preventable fashion.
>
> Let's be clear here that the issue with the upgrade instructions was
> that the user didn't follow your *application's* upgrade instructions,
> and your later code wasn't written to use the function, as documented,
> properly- this isn't a case of PG destroying your data. It's fine to
> contend that the interface sucks and that we should change it, but the
> argument that PG is eating data because the application sent a query to
> the database telling it, based on our documentation, to eat the data,
> isn't appropriate. Again, let's have a reasonable discussion here about
> if it makes sense to make a change here because the interface isn't
> intuitive and doesn't match what other systems do (I'm guessing it isn't
> in the SQL standard either, so we unfortunately can't look to that for
> help; though I'd hardly be surprised if they supported what PG does
> today anyway).

Okay, I will admit that saying PG is eating data is perhaps
hyperbolic, but I will also say that the behaviour of jsonb_set()
under this type of edge case is unintuitive and frequently results in
unintended data loss. So, while PostgreSQL is not actually eating the
data, it is putting the user in a position where they may suffer data
loss if they are not extremely careful.

Here is how other implementations handle this case:

MySQL/MariaDB:

select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
{"a":null,"b":2,"c":3}

Microsoft SQL Server:

select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
{"b":2,"c":3}

Both of these outcomes make sense, given the nature of JSON objects.
I am actually more in favor of what MSSQL does however, I think that
makes the most sense of all.

I did not compare to other database systems, because using them I
found that there is a JSON_TABLE type function and then you do stuff
with that to rewrite the object and dump it back out as JSON, and it's
quite a mess. But MySQL and MSSQL have an equivalent jsonb inline
modification function, as seen above.

> As a practical response to the issue you've raised- have you considered
> using a trigger to check the validity of the new jsonb? Or, maybe, just
> made the jsonb column not nullable? With a trigger you could disallow
> non-null->null transistions, for example, or if it just shouldn't ever
> be null then making the column 'not null' would suffice.

We have already mitigated the issue in a way we find appropriate to
do. The suggestion of having a non-null constraint does seem useful
as well and I will look into that.

> I'll echo Christoph's comments up thread too, though in my own language-
> these are risks you've explicitly accepted by using JSONB and writing
> your own validation and checks (or, not, apparently) rather than using
> what the database system provides. That doesn't mean I'm against
> making the change you suggest, but it certainly should become a lesson
> to anyone who is considering using primairly jsonb for their storage
> that it's risky to do so, because you're removing the database system's
> knowledge and understanding of the data, and further you tend to end up
> not having the necessary constraints in place to ensure that the data
> doesn't end up being garbage- thus letting your application destroy all
> the data easily due to an application bug.

Ariadne

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ariadne Conill 2019-10-19 02:18:41 Re: jsonb_set() strictness considered harmful to data
Previous Message Adrian Klaver 2019-10-19 00:04:21 Re: jsonb_set() strictness considered harmful to data

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-10-19 02:14:21 Re: v12.0: segfault in reindex CONCURRENTLY
Previous Message Ashwin Agrawal 2019-10-19 00:55:32 Missing error_context_stack = NULL in AutoVacWorkerMain()