Re: psql JSON output format

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: psql JSON output format
Date: 2024-01-17 19:52:55
Message-ID: CA+TgmoaXaEZTQsommD+rBTztFpEST2kdnBkC=maPSvA3L0-sZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2024 at 4:30 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> As mentioned in my other mail, I was talking about the psql output
> format "csv" rather than about COPY.

Oh. Well, I think it's sad that the psql format csv has that property.
Why doesn't it adopt COPY's handling?

> I agree that it is desirable to lose as little information as possible.
> But if we want to format query output as JSON, we have a couple of
> requirements that cannot all be satisfied:
>
> 1. lose no information ("round-trip safe")
>
> 2. don't double quote numbers, booleans and other JSON values
>
> 3. don't skip any table column in the output
>
> Christoph's original patch didn't satisfy #2, and his current version
> doesn't satisfy #1. Do you think that skipping NULL columns would be
> the best solution? We don't do that in the to_json() function, which
> also renders SQL NULL as JSON null.

Let me start by clarifying that I'm OK with sacrificing
round-trippability here as long as we do it thoughtfully.
"Round-trippability is important but X is more important and we cannot
have both for Y reasons" seems like a potentially fine argument to me;
I'm only objecting to an argument of the form "round-trippability
doesn't even matter." My previous comment was a bit of a drive-by
remark on that specifically rather than a strong opinion about what
exactly we ought to do here.

I guess the specifically issue here is around a json(b) column that is
null at the SQL level vs one that contains a JSON null. How do we
distinguish those cases? I think entirely omitting null columns could
be a way forward, but I don't know if that would cause other problems
for users.

I'm not quite sure that addresses all the issues, though. For
instance, consider that 1.00::numeric and 1.0::numeric are equal but
distinguishable. If those get rendered into the JSON unquoted as 1.00
and 1.0, respectively, is that going to round-trip properly? What
about float8 values where extra_float_digits=3 is needed to properly
round trip? If we take PostgreSQL's array data types and turn them
into JSON arrays, what happens with non-default bounds? I know how
we're going to turn '{1,2}'::int[] into a JSON array, or at least I
assume I do, but what in the world are we going to do about
'[-3:-2]={1,2}'?

As much as I think round-trippability is good, getting it to 100% here
is probably a good bit of work. And maybe that work isn't worth doing
or involves too much collateral damage. But I do think it has positive
value. If we produce output that could be ingested back into PG later
with the right tool, that leaves the door open for someone to build
the tool later even if we don't have it today. If we produce output
that loses information, no tool built later can make up for the loss.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-01-17 19:57:35 Re: initdb's -c option behaves wrong way?
Previous Message Peter Eisentraut 2024-01-17 19:33:14 Re: More new SQL/JSON item methods