Re: PATCH: recursive json_populate_record()

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: recursive json_populate_record()
Date: 2017-01-24 22:25:01
Message-ID: 9e21a39c-c1d7-b9b5-44a0-c5345a5029f6@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.01.2017 21:58, Tom Lane wrote:
> In the meantime, we are *not* going to have attndims be semantically
> significant in just this one function. Therefore, please remove this patch's
> dependence on attndims.

Ok, I've removed patch's dependence on attndims. But I still believe that
someday PostgreSQL's type system will be fixed.

> I looked through the patch briefly and have some other comments:

Thank you very much for your review.

> 1. It's pretty bizarre that you need dummy forward struct declarations
> for ColumnIOData and RecordIOData. The reason evidently is that somebody
> ignored project style and put a bunch of typedefs after the local function
> declarations in jsonfuncs.c. Project style of course is to put the
> typedefs first, precisely because they may be needed in the local function
> declarations. I will go move those declarations right now, as a single-
> purpose patch, so that you have something a bit cleaner to modify.

These forward struct declarations were moved to the type declaration section.

> 2. The business with isstring, saved_scalar_is_string, etc makes me itch.
> I'm having a hard time explaining exactly why, but it just seems like a
> single-purpose kluge. Maybe it would seem less so if you saved the
> original JsonTokenType instead.

Original JsonTokenType is saved now. Also "isnull" fields of several structures
were replaced by it.

> But also I'm wondering why the ultimate consumers are concerned with
> string-ness as such.

saved_scalar_is_string was necessary for the case when json string is converted
to json/jsonb type. json lexer returns strings with stripped quotes and we
must recover them before passing to json_in() or jsonb_in(). There were
examples for this case in my first message:

[master]=# select * from json_to_record('{"js": "a"}') as rec(js json);
ERROR: invalid input syntax for type json
DETAIL: Token "a" is invalid.
CONTEXT: JSON data, line 1: a

[master]=# select * from json_to_record('{"js": "true"}') as rec(js json);
js
------
true

[patched]=# select * from json_to_record('{"js": "a"}') as rec(js json);
js
-----
"a"

[patched]=# select * from json_to_record('{"js": "true"}') as rec(js json);
js
--------
"true"

> It seems like mainly what they're trying to do is forbid converting a string
> into a non-scalar Postgres type, but (a) why, and (b) if you are going to
> restrict it, wouldn't it be more sensible to frame it as you can't convert any
> JSON scalar to a non-scalar type? As to (a), it seems like you're just
> introducing unnecessary compatibility breakage if you insist on that.
> As to (b), really it seems like an appropriate restriction of that sort
> would be like "if target type is composite, source must be a JSON object,
> and if target type is array, source must be a JSON array". Personally
> I think what you ought to do here is not change the semantics of cases
> that are allowed today, but only change the semantics in the cases of
> JSON object being assigned to PG composite and JSON array being assigned
> to PG array; both of those fail today, so there's nobody depending on the
> existing behavior. But if you reject string-to-composite cases then you
> will be breaking existing apps, and I doubt people will thank you for it.

I've removed compatibility-breaking restrictions and now string-to-non-scalar
conversions through the input function are allowed. Also I've added
corresponding regression test cases.

> 3. I think you'd be better off declaring ColumnIOData.type_category as an
> actual enum type, not "char" with some specific values documented only
> by a comment. Also, could you fold the domain case in as a fourth
> type_category?

I've introduced enum type TypeCat for type categories with domain category as
its fourth member. (TypeCategory and JsonTypeCategory names conflict with
existing names, you might offer a better name.)

> Also, why does ColumnIOData have both an ndims field and another one buried
> in io.array.ndims?

Now there are no ndims fields at all, but earlier their values could differ:
ColumnIOData.ndims was typically copied from attndims, but ArrayIOData.ndims
could be copied from typndims for domain types.

> 4. populate_array_report_expected_array() violates our error message
> guidelines, first by using elog rather than ereport for a user-facing
> error, and second by assembling the message from pieces, which would
> make it untranslatable even if it were being reported through ereport.
> I'm not sure if this needs to be fixed though; will the function even
> still be needed once you remove the attndims dependency? Even if there
> are still callers, you wouldn't necessarily need such a function if
> you scale back your ambitions a bit as to the amount of detail required.
> I'm not sure you really need to report what you think the dimensions
> are when complaining that an array is nonrectangular.

It was my mistake that I was not familiar message-error guidelines. Now
ereport() is used and there is no message assembling, but I'm also not sure
that we need to report these details.

> 5. The business with having some arguments that are only for json and
> others that are only for jsonb, eg in populate_scalar(), also makes me
> itch.

I've refactored json/jsonb passing using new struct JsValue which contains
union for json/jsonb values.

> I wonder whether this wouldn't all get a lot simpler and more
> readable if we gave up on trying to share code between the two cases.
> In populate_scalar(), for instance, the amount of code actually shared
> between the two cases amounts to a whole two lines, which are dwarfed
> by the amount of crud needed to deal with trying to serve both cases
> in the one function. There are places where there's more sharable
> code than that, but it seems like a better design might be to refactor
> the sharable code out into subroutines called by separate functions for
> json and jsonb.

Maybe two separate families of functions like this

a_json(common_args, json_args)
{
b(common_args);
c_json(json_args);
d(common_args);
}

a_jsonb(common_args, jsonb_args)
{
b(common_args);
c_jsonb(jsonb_args);
d(common_args);
}

could slightly improve readability, but such code duplication (I believe it is
a duplicate code) would never be acceptable to me.

I can only offer to extract two subroutines from from populate_scalar():
populate_scalar_json() and populate_scalar_jsonb(). I think InputFunctionCall()
here should have exactly the one call site because semantically there is only
the one such call per scalar, regardless of its type.

> 6. I'm not too excited by your having invented JsonContainerSize,
> JsonContainerIsObject, etc, and then using them in just this new
> code. That is not really improving readability, it's just creating
> stylistic inconsistency between these functions and the rest of the
> jsonb code.

These new macros were introduced because existing JB_XXX() macros work with
Jsonb struct and there were no analogous macros for JsonbContainer struct.
They were not invented specifically for this patch: they are the result of the
deep refactoring of json/jsonb code which was made in the process of working on
jsonb compression (I'm going to present this work here soon). This refactoring
allows us to use a single generic interface to json, jsonb and jsonbc (new
compressed format) types using ExpandedObject representation, but direct access
to JsonbContainer fields becomes illegal. So I'am trying not to create new
direct references to JsonbContainer fields. Also I could offer to rename these
macros to JBC_XXX() or JB_CONTAINER_XXX() but it would lead to unnecessary
conflicts.

> If you want such macros I think it would be better to submit a separate
> cosmetic patch that tries to hide such bit-tests behind macros throughout
> the jsonb code.

I've attached that patch, but not all the bit-tests were hidden: some of them
in jsonb_util.c still remain valid after upcoming refactoring because they
don't belong to generic code (there might be better to use JBC_XXX() macros).

> Another problem is that these macros are coding hazards because they look like
> they yield bool values but they don't really; assigning the results to bool
> variables, for example, would fail on most platforms. Safer coding for a
> general-purpose macro would be like
>
> #define JsonContainerIsObject(jc) (((jc)->header & JB_FOBJECT) != 0)

Sorry for this obvious mistake. But macros JB_ROOT_IS_XXX() also contain the
same hazard.

> 7. More generally, the patch is hard to review because it whacks the
> existing code around so thoroughly that it's tough even to match up
> old and new code to get a handle on what you changed functionally.
> Maybe it would be good if you could separate it into a refactoring
> patch that just moves existing code around without functional changes,
> and then a patch on top of that that adds code and makes only localized
> changes in what's already there.

I've split this patch into two patches as you asked.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-introduce-JsonContainerXxx-macros-v04.patch text/x-patch 2.5 KB
0002-refactor-json_populate_record-v04.patch text/x-patch 35.3 KB
0003-recursive-json_populate_record-v04.patch text/x-patch 73.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-01-24 22:37:44 Re: patch: function xmltable
Previous Message Peter Eisentraut 2017-01-24 22:24:36 Re: [COMMITTERS] pgsql: Add pg_sequence system catalog