Re: jsonb concatenate operator's semantics seem questionable

From: "Ilya I(dot) Ashchepkov" <koctep(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: jsonb concatenate operator's semantics seem questionable
Date: 2015-05-18 21:21:18
Message-ID: CALgmQD8zn=Hn1zfqAXAQjB1Sxf2+SBuxrsGo8792jAjL0B_o5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

[3] First of all few words about concatenation of jsonb values in my mind.
Jsonb values concatenation result may follow this rules:
1) array of both values if both are scalars
2) concatenated array if both are arrays
3) prepended or appended array if only one is array
4) recursive concatenated jsonb-object if both is objects

[4] When we are merging two jsonb-objects and we have a key in both we can:
1) use value from last jsonb-object
2) concatenate this values using rules from [3]

[5] Also I want to be able to do the following:
# select JSONB_MERGE_FUN(SMTHNG, '{"a":{"b":1,"c":1}}'::jsonb,
'{"a":{"b":2,"c":2}}'::jsonb);
?column?
-------------------------
{"a":{"b":[1,2],"c":2}}

I'm thinking about SMTHNG as a function, that receive:
1) array of keys - where merging "cursor" is in jsonb objects
2) types of jsonb objects at this cursor (null/scalar/array/object)
Returned value determinates behaviour that will be used to apply rules from [4].

Current function implements the following algorithm (for objects):
1) use value from last object if keys array length is not 0

My implementation:
1) use value from last object if type is scalar or array
2) concatenate values if both are objects

At this moment I know how implement something like function from [5] on C,
but I think it will be not very useful for developers.
I think it will be useful if it may be implemented like aggregate.

Any thoughts?

On Sun, May 17, 2015 at 2:56 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Another thing that I noticed about the new jsonb stuff is that the
> concatenate operator is based on the hstore one. This works as
> expected:
>
> postgres=# select '{"a":1}'::jsonb || '{"a":2}';
> ?column?
> ----------
> {"a": 2}
> (1 row)
>
> However, the nesting doesn't "match up" -- containers are not merged
> beyond the least-nested level:
>
> postgres=# select '{"a":{"nested":1}}'::jsonb || '{"a":{"also nested":2}}';
> ?column?
> ---------------------------
> {"a": {"also nested": 2}}
> (1 row)
>
> This feels wrong to me. When jsonb was initially introduced, we took
> inspiration for the *containment* ("operator @> jsonb") semantics from
> hstore, but since jsonb is nested it worked in a nested fashion. At
> the top level and with no nested containers there was no real
> difference, but we had to consider the behavior of more nested levels
> carefully (the containment operator is clearly the most important
> jsonb operator). I had envisaged that with the concatenation of jsonb,
> concatenation would similarly behave in a nested fashion. Under this
> scheme, the above query would perform nested concatenation as follows:
>
> postgres=# select '{"a":{"nested":1}}'::jsonb || '{"a":{"also
> nested":2}}'; -- does not match actual current behavior
> ?column?
> ---------------------------
> {"a": {"nested":1, "also nested": 2}}
> (1 row)
>
> Now, I think it's good that the minus operator ("operator - text" and
> friends) discussed on the nearby thread accepts a text (or int)
> argument and remove string elements/pairs at the top level only. This
> works exactly the same as existence (I happen to think that removing
> elements/pairs at a nested level is likely to be more trouble than
> it's worth, and so I don't really like the new "jsonb - text[]"
> operator much, because it accepts a Postgres (not JSON) array of texts
> that constitute a path, which feels odd). So I have no issue with at
> least the plain minus operators' semantics. But I think that the
> concatenate operator's current semantics are significantly less useful
> than they could be, and are not consistent with the overall design of
> jsonb.
>
> I'm particularly concerned about a table containing many homogeneously
> structured, deeply nested jsonb datums (think of the delicious URLs
> dataset that jsonb was originally tested using for a good example of
> that -- this is quite representative of how people use jsonb in the
> real world). It would be almost impossible to perform insert-or-update
> type operations to these deeply nested elements using hstore style
> concatenation. You'd almost invariably end up removing a bunch of
> irrelevant nested values of the documents, when you only intended to
> update one deeply nested value.
>
> Looking back at the discussion of the new jsonb stuff, a concern was
> raised along these lines by Ilya Ashchepkov [1], but this was
> dismissed. I feel pretty strongly that this should be revisited. I'm
> willing to concede that we might not want to always merge containers
> that are found in the same position during concatenation, but I think
> it's more likely that we do. As with containment, my sense is that
> there should be nothing special about the nesting level -- it should
> not influence whether we merge rather than overwrite the operator's
> lhs container (with or into the rhs container). Not everyone will
> agree with this [2].
>
> I'm sorry that I didn't get to this sooner, but I was rather busy when
> it was being discussed.
>
> [1] http://www.postgresql.org/message-id/55006879.2050601@dunslane.net
> [2] http://www.postgresql.org/message-id/54EF61DD.7040208@agliodbs.com
> --
> Peter Geoghegan

--
С уважением,
Ащепков Илья koctep(at)gmail(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-05-18 21:28:40 Re: 9.5 open items
Previous Message Peter Geoghegan 2015-05-18 21:09:45 Re: Minor ON CONFLICT related fixes