jsonb_set() strictness considered harmful to data

From: Ariadne Conill <ariadne(at)dereferenced(dot)org>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Cc: feld(at)freebsd(dot)org
Subject: jsonb_set() strictness considered harmful to data
Date: 2019-10-18 17:37:24
Message-ID: CAAOiGNwUsgM-UBqsJH_mriCFr-JCPxD+pQoXNvyOEQPc+YHQRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hello,

I am one of the primary maintainers of Pleroma, a federated social
networking application written in Elixir, which uses PostgreSQL in
ways that may be considered outside the typical usage scenarios for
PostgreSQL.

Namely, we leverage JSONB heavily as a backing store for JSON-LD
documents[1]. We also use JSONB in combination with Ecto's "embedded
structs" to store things like user preferences.

The fact that we can use JSONB to achieve our design goals is a
testament to the flexibility PostgreSQL has.

However, in the process of doing so, we have discovered a serious flaw
in the way jsonb_set() functions, but upon reading through this
mailing list, we have discovered that this flaw appears to be an
intentional design.[2]

A few times now, we have written migrations that do things like copy
keys in a JSONB object to a new key, to rename them. These migrations
look like so:

update users set info=jsonb_set(info, '{bar}', info->'foo');

Typically, this works nicely, except for cases where evaluating
info->'foo' results in an SQL null being returned. When that happens,
jsonb_set() returns an SQL null, which then results in data loss.[3]

This is not acceptable. PostgreSQL is a database that is renowned for
data integrity, but here it is wiping out data when it encounters a
failure case. The way jsonb_set() should fail in this case is to
simply return the original input: it should NEVER return SQL null.

But hey, we've been burned by this so many times now that we'd like to
donate a useful function to the commons, consider it a mollyguard for
the real jsonb_set() function.

create or replace function safe_jsonb_set(target jsonb, path
text[], new_value jsonb, create_missing boolean default true) returns
jsonb as $$
declare
result jsonb;
begin
result := jsonb_set(target, path, coalesce(new_value,
'null'::jsonb), create_missing);
if result is NULL then
return target;
else
return result;
end if;
end;
$$ language plpgsql;

This safe_jsonb_set() wrapper should not be necessary. PostgreSQL's
own jsonb_set() should have this safety feature built in. Without it,
using jsonb_set() is like playing russian roulette with your data,
which is not a reasonable expectation for a database renowned for its
commitment to data integrity.

Please fix this bug so that we do not have to hack around this bug.
It has probably ruined countless people's days so far. I don't want
to hear about how the function is strict, I'm aware it is strict, and
that strictness is harmful. Please fix the function so that it is
actually safe to use.

[1]: JSON-LD stands for JSON Linked Data. Pleroma has an "internal
representation" that shares similar qualities to JSON-LD, so I use
JSON-LD here as a simplification.

[2]: https://www.postgresql.org/message-id/flat/qfkua9$2q0e$1(at)blaine(dot)gmane(dot)org

[3]: https://git.pleroma.social/pleroma/pleroma/issues/1324 is an
example of data loss induced by this issue.

Ariadne

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mark Felder 2019-10-18 19:10:51 Re: jsonb_set() strictness considered harmful to data
Previous Message Justin Pryzby 2019-10-18 16:15:02 Re: Can you please tell us how set this prefetch attribute in following lines.

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-10-18 18:26:27 Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held
Previous Message Konstantin Knizhnik 2019-10-18 16:53:23 Re: Columns correlation and adaptive query optimization