Re: Emitting JSON to file using COPY TO

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2026-02-07 13:27:29
Message-ID: CAEG8a3LB7q1eQ0AzFTEBDFDxs3kJ=5iJA+HTmiYGya6Wb5jsRA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi jian,

Thanks for keeping the patch set up to date with the master.

On Fri, Feb 6, 2026 at 11:26 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> On Wed, Feb 4, 2026 at 12:41 AM Florents Tselai
> <florents(dot)tselai(at)gmail(dot)com> wrote:
> >
> > I (and others I assume) would really like to see this in 19;
> > glancing at the thread above and in the test cases I see this is in good shape for comitter review.
> > No?
> >
> > If I were to add something it would be an example in copy.sgml
> > <para>
> > When the <literal>FORCE_ARRAY</literal> option is enabled,
> > the entire output is wrapped in a JSON array and individual rows are separated by commas:
> > <programlisting>
> > COPY (SELECT id, name FROM users) TO STDOUT (FORMAT JSON, FORCE_ARRAY);
> > </programlisting>
> > <programlisting>
> > [
> > {"id": 1, "name": "Alice"}
> > ,{"id": 2, "name": "Bob"}
> > ,{"id": 3, "name": "Charlie"}
> > ]
> > </programlisting>
> > </para>
> >
>
> v23-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch
> I've added:
>
> +<para>
> + When the <literal>FORCE_ARRAY</literal> option is enabled,
> + the entire output is wrapped in a single JSON array with rows
> separated by commas:
> +<programlisting>
> +COPY (SELECT * FROM (VALUES(1),(2)) val(id)) TO STDOUT (FORMAT JSON,
> FORCE_ARRAY);
> +</programlisting>
> +The output is as follows:
> +<screen>
> +[
> + {"id":1}
> +,{"id":2}
> +]
> +</screen>
> +</para>
> +
> +
>
> > Also, apologies if that has been discussed already,
> > is there a good reason why didn't we just go with a simple "WRAP_ARRAY" ?
> >
>
> I don’t have a particular preference.
> If the consensus is that WRAP_ARRAY is better than FORCE_ARRAY, we can
> change it accordingly.
>
>
> --
> jian
> https://www.enterprisedb.com/

Here are some comments on v23:

0001: The refactor looks straightforward to me. Introducing a format
field should make future extensions easier. One suggestion is that we
could add some helper macros around format, for example:

#define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV)
#define IS_FORMAT_TEXT_LIKE(format) \
(format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV)

I think this would improve readability.

0002: Since you have moved the `CopyFormat enum` into 0001, the
following commit msg should be rephrased.

The CopyFormat enum was originally contributed by Joel Jacobson
joel(at)compiler(dot)org, later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).

- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+ }

Can we add a function that converts CopyFormat to a string? Treating
CopyFormat as %s in error messages would make the code shorter.
However, I'm not sure whether this aligns with translation
conventions, correct me if I'm wrong.

- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the

I'd suggest rewording to `CSV, text and json ...`. The same applied to
other parts in this patch.

0003: The commit message includes some changes(adapt the newly
introduced CopyToRoutine) that actually belong to 0002; it would be
better to remove them from this commit.

+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }

can we do this:

if (cstate->opts.force_array)
{
if (cstate->json_row_delim_needed)
CopySendChar(cstate, ',');
else
{
/* first row needs no delimiter */
CopySendChar(cstate, ' ');
cstate->json_row_delim_needed = true;
}
}

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Durgamahesh Manne 2026-02-08 04:19:05 Table bloat threshold limit to trigger repack
Previous Message Greg Sabino Mullane 2026-02-06 19:34:56 Re: UNLOGGED table CREATEd on one connection not immediately visible to another connection

Browse pgsql-hackers by date

  From Date Subject
Next Message Chengpeng Yan 2026-02-07 13:39:08 Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types
Previous Message Robert Haas 2026-02-07 13:04:32 Re: pg_waldump: support decoding of WAL inside tarfile