| From: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
|---|---|
| To: | jian he <jian(dot)universality(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Copy from JSON FORMAT. |
| Date: | 2026-06-07 15:13:07 |
| Message-ID: | CAEG8a3LP5O6Vc6THHA6KU0=RugZ3VekNqAqhqG+fZsSO4rkpcA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sun, Mar 22, 2026 at 10:16 PM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
>
> Hi Jian,
>
> On Sun, Mar 22, 2026 at 8:59 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> >
> > On Sun, Mar 22, 2026 at 4:59 PM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
> > >
> > > Hi hackers,
> > >
> > > 7dadd38cda introduced support for COPY TO in JSON format, but did
> > > not include the corresponding COPY FROM functionality. The original
> > > discussion [1] mentioned the possibility of supporting COPY FROM as
> > > well, but that part of the patch never happened (If I don't miss).
> > >
> > > For a data format like JSON, basic round-trip capability seems essential:
> > > if data can be exported via COPY TO, it should also be possible to import
> > > it back using COPY FROM. So I try to close that gap.
> > >
> > Hi.
> >
> > repalloc can be replaced by repalloc_array.
> > maybe palloc, palloc0 can aslo be replaced by palloc_object/palloc0_object.
>
> I changed some of them, some are not changed because other similar patterns
> use this pattern, I'd like to keep the consistency.
>
> >
> > ``foreach(cur, cstate->attnumlist)``
> > can be replaced by
> > ``foreach_int(cur, cstate->attnumlist)``
>
> Changed, since this can save a line of define, I also applied the same
> change to other instances in this file.
>
> >
> > + ereport(ERROR,
> > + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> > + errmsg("invalid input format for COPY JSON"),
> > + errdetail("Document must begin with \"[\" or \"{\".")));
> > + pg_unreachable();
> > The parentheses in `` (errcode`` is not necessary, (and other palces)
> > See https://www.postgresql.org/message-id/202510100916.s2e6n3xiwvyc%40alvherre.pgsql
>
> Changed. There are quite some `` (errcode`` usages in this file, I
> don't touch them to introduce more code churn.
>
> >
> > Should the JSON object match the table column?
> > create table copytest_from_json(style int, test text);
> > copy copytest_from_json (style, test) from stdin (format json);
> >
> > The below 3 valid json should all fail?
> > {"style": "1", "test": 1, "x": 1}
> > {"style": "1"}
> > {"x": "1"}
>
> I think json format should be more flexible, as long as it is a valid json
> string representation. Extra fields and missing fields should be accepted.
>
> But I'd like to know what others' thought on this.
>
> >
> > I am not sure the handling of empty strings is correct.
> > src9=# copy copytest_from_json (style, test) from stdin (format json);
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF signal.
> > >>
> > >> \.
> > COPY 0
> >
> > src9=# copy copytest_from_json (style, test) from stdin (format text);
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF signal.
> > >>
> > >> \.
> > ERROR: invalid input syntax for type integer: ""
> > CONTEXT: COPY copytest_from_json, line 1, column style: ""
>
> I'm not entirely sure about this either, it’s a corner case. I also noticed
> that the current implementation accepts [] as valid input, and the JSON
> output for an empty table is also []. From a round-trip perspective,
> the current behavior seems reasonable.
>
> Attach is the v2 resolved part of your comments, thanks for looking.
>
> >
> >
> > --
> > jian
> > https://www.enterprisedb.com/
>
>
>
> --
> Regards
> Junwang Zhao
Here is v3, with some improvement:
1. Stricter JSON array scanning, adds an explicit array scan state
machine, so commas, array endings, missing separators, trailing
commas, and data after the top-level array are all treated as errors
now.
2. Better EOF handling, it now detects EOF while inside an unfinished
object or string and reports unexpected end of input in COPY JSON,
instead of potentially waiting indefinitely or producing unclear
behavior.
3. Correct json/jsonb scalar round-trip, for target columns of type
json or jsonb, scalar JSON values are preserved as valid JSON text,
e.g. "foo" stays "foo" and true stays true, instead of being converted
to SQL-style text such as foo or t.
4. More accurate error line numbers, which was pointed out by Jian off-list.
--
Regards
Junwang Zhao
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Support-COPY-FROM-with-FORMAT-JSON.patch | application/octet-stream | 51.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Enrique Sánchez | 2026-06-07 15:22:41 | Re: Extended statistics improvement: multi-column MCV missing values |
| Previous Message | Imran Zaheer | 2026-06-07 14:35:05 | Fix comment in report_sequence_errors() |