Re: Emitting JSON to file using COPY TO

From: Dominique Devienne <ddevienne(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Davin Shearer <scholarsmate(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2023-11-27 15:26:43
Message-ID: CAFCRh-_GdiUvjd5z5FfvTfhruOnYqBu163XU47zZE8RNATCJGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Nov 27, 2023 at 3:56 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> > I agree there should be a copy option for “not formatted” so if you dump
> a
> > single column result in that format you get the raw unescaped contents of
> > the column.
>
> I'm not sure I even buy that. JSON data in particular is typically
> multi-line, so how will you know where the row boundaries are?
> That is, is a newline a row separator or part of the data?
>
> You can debate the intelligence of any particular quoting/escaping
> scheme, but imagining that you can get away without having one at
> all will just create its own problems.
>

What I was suggesting is not about a "not formatted" option.
But rather than JSON values (i.e. typed `json` or `jsonb`) in a
JSON-formatted COPY operator, the JSON values should not be
serialized to text that is simply output as a JSON-text-value by COPY,
but "inlined" as a "real" JSON value without the JSON document output by
COPY.

This is a special case, where the inner and outer "values" (for lack of a
better terminology)
are *both* JSON documents, and given that JSON is hierarchical, the inner
JSON value can
either by 1) serializing to text first, which must thus be escaped using
the JSON escaping rules,
2) NOT serialized, but "inline" or "spliced-in" the outer COPY JSON
document.

I guess COPY in JSON mode supports only #1 now? While #2 makes more sense
to me.
But both options are valid. Is that clearer?

BTW, JSON is not multi-line, except for insignificant whitespace.
So even COPY in JSON mode is not supposed to be line based I guess?
Unless COPY in JSON mode is more like NDJSON (https://ndjson.org/)? --DD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2023-11-27 15:27:41 Re: Parallel Index Scan Implementation
Previous Message Ron Johnson 2023-11-27 15:16:41 Re: PostgreSql: Canceled on conflict out to old pivot

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-11-27 15:54:20 Re: brininsert optimization opportunity
Previous Message Alexander Pyhalov 2023-11-27 15:11:56 Re: Add semi-join pushdown to postgres_fdw