Re: Emitting JSON to file using COPY TO

From: Davin Shearer <davin(at)apache(dot)org>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2023-12-02 03:00:29
Message-ID: CALvfUkByf6eou6J36-2Hw7Kdro80JD_Ch1AeUDAjsX8RpcA==A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I'm really glad to see this taken up as a possible new feature and will
definitely use it if it gets released. I'm impressed with how clean,
understandable, and approachable the postgres codebase is in general and
how easy it is to read and understand this patch.

I reviewed the patch (though I didn't build and test the code) and have a
concern with adding the '[' at the beginning and ']' at the end of the json
output. Those are already added by `json_agg` (
https://www.postgresql.org/docs/current/functions-aggregate.html) as you
can see in my initial email. Adding them in the COPY TO may be redundant
(e.g., [[{"key":"value"...}....]]).

I think COPY TO makes good sense to support, though COPY FROM maybe not so
much as JSON isn't necessarily flat and rectangular like CSV.

For my use-case, I'm emitting JSON files to Apache NiFi for processing, and
NiFi has superior handling of JSON (via JOLT parsers) versus CSV where
parsing is generally done with regex. I want to be able to emit JSON using
a postgres function and thus COPY TO.

Definitely +1 for COPY TO.

I don't think COPY FROM will work out well unless the JSON is required to
be flat and rectangular. I would vote -1 to leave it out due to the
necessary restrictions making it not generally useful.

Hope it helps,
Davin

On Fri, Dec 1, 2023 at 6:10 PM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:

> On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:
> > I did a quick PoC patch (attached) -- if there interest and no hard
> > objections I would like to get it up to speed for the January commitfest.
>
> Cool. I would expect there to be interest, given all the other JSON
> support that has been added thus far.
>
> I noticed that, with the PoC patch, "json" is the only format that must be
> quoted. Without quotes, I see a syntax error. I'm assuming there's a
> conflict with another json-related rule somewhere in gram.y, but I haven't
> tracked down exactly which one is causing it.
>
> > 1. Is supporting JSON array format sufficient, or does it need to support
> > some other options? How flexible does the support scheme need to be?
>
> I don't presently have a strong opinion on this one. My instinct would be
> start with something simple, though. I don't think we offer any special
> options for log_destination...
>
> > 2. This only supports COPY TO and we would undoubtedly want to support
> COPY
> > FROM for JSON as well, but is that required from the start?
>
> I would vote for including COPY FROM support from the start.
>
> > ! if (!cstate->opts.json_mode)
>
> I think it's unfortunate that this further complicates the branching in
> CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
> worth refactoring a bunch of stuff to make this nicer.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joe Conway 2023-12-02 03:10:54 Re: Emitting JSON to file using COPY TO
Previous Message Nathan Bossart 2023-12-01 23:09:58 Re: Emitting JSON to file using COPY TO

Browse pgsql-hackers by date

  From Date Subject
Next Message Lev Kokotov 2023-12-02 03:06:40 Re: Bug in pgbench prepared statements
Previous Message Alexander Lakhin 2023-12-02 03:00:01 Re: Refactoring backend fork+exec code