Re: Emitting JSON to file using COPY TO

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Florents Tselai <florents(dot)tselai(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2026-03-17 17:50:35
Message-ID: CAD21AoCFGwU3pvvFeFW-NYnJX_QKZ2xKgNEH__ho_a-5aN__iA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Mar 16, 2026 at 1:59 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 2026-03-16 Mo 2:24 PM, Masahiko Sawada wrote:
>
> On Sun, Mar 8, 2026 at 8:49 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> On Mon, Mar 9, 2026 at 3:44 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> Hmm. But should we be scribbling on slot->tts_tupleDescriptor like that?
> How about something like this?:
>
> - * Full table or query without column list. Ensure the slot uses
> - * cstate->tupDesc so that the datum is stamped with the right type;
> - * for queries output type is RECORDOID this must be the blessed
> - * descriptor so that composite_to_json can look it up via
> - * lookup_rowtype_tupdesc.
> + * Full table or query without column list. For queries, the slot's
> + * TupleDesc may carry RECORDOID, which is not registered in the
> type
> + * cache and would cause composite_to_json's lookup_rowtype_tupdesc
> + * call to fail. Build a HeapTuple stamped with the blessed
> + * descriptor so the type can be looked up correctly.
> */
> if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid ==
> RECORDOID)
> - slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
> + {
> + HeapTuple tup;
>
> - rowdata = ExecFetchSlotHeapTupleDatum(slot);
> + tup = heap_form_tuple(cstate->tupDesc,
> + slot->tts_values,
> + slot->tts_isnull);
> + rowdata = HeapTupleGetDatum(tup);
> + }
> + else
> + {
> + rowdata = ExecFetchSlotHeapTupleDatum(slot);
> + }
>
> This is better. I've tried to get rid of json_projvalues and json_projnulls.
> Just using heap_form_tuple, but it won't work.
>
> I incorporated the v28-0004 COPY column list into v9-0002.
> With this patch set, we added four fields to the struct CopyToStateData.
>
> + StringInfo json_buf; /* reusable buffer for JSON output,
> + * initialized in BeginCopyTo */
> + TupleDesc tupDesc; /* Descriptor for JSON output; for a column
> + * list this is a projected descriptor */
> + Datum *json_projvalues; /* pre-allocated projection values, or
> + * NULL */
> + bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
>
> Using the script in
> https://www.postgresql.org/message-id/CACJufxFFZqxC3p4WjpTEi4riaJm%3DpADX%2Bpy0yQ0%3DRWTn5cqK3Q%40mail.gmail.com
> I tested it again on macOS and Linux, and there are no regressions for
> COPY TO with the TEXT and CSV formats.
>
> I've reviewed the patch and have some comments:
>
> ---
> I got a SEGV in the following scenario:
>
> postgres(1:1197708)=# create table test (a int, b text, c jsonb);
> CREATE TABLE
> postgres(1:1197708)=# copy test(a, b) to stdout with (format 'json' );
> TRAP: failed Assert("tupdesc->firstNonCachedOffsetAttr >= 0"), File:
> "execTuples.c", Line: 2328, PID: 1197708
> postgres: masahiko postgres [local] COPY(ExceptionalCondition+0x9e) [0xbebe48]
> postgres: masahiko postgres [local] COPY(BlessTupleDesc+0x2b) [0x729b50]
> postgres: masahiko postgres [local] COPY(BeginCopyTo+0xc94) [0x637bdf]
> postgres: masahiko postgres [local] COPY(DoCopy+0xb68) [0x62afbc]
> postgres: masahiko postgres [local]
> COPY(standard_ProcessUtility+0xa22) [0xa0ba48]
> postgres: masahiko postgres [local] COPY(ProcessUtility+0x10e) [0xa0b01f]
> postgres: masahiko postgres [local] COPY() [0xa09872]
> postgres: masahiko postgres [local] COPY() [0xa09acf]
> postgres: masahiko postgres [local] COPY(PortalRun+0x2c8) [0xa0901d]
> postgres: masahiko postgres [local] COPY() [0xa02055]
> postgres: masahiko postgres [local] COPY(PostgresMain+0xaf1) [0xa0724e]
> postgres: masahiko postgres [local] COPY() [0x9fdab9]
> postgres: masahiko postgres [local]
> COPY(postmaster_child_launch+0x165) [0x905378]
> postgres: masahiko postgres [local] COPY() [0x90b600]
> postgres: masahiko postgres [local] COPY() [0x908e6a]
> postgres: masahiko postgres [local] COPY(PostmasterMain+0x14fe) [0x90880c]
> postgres: masahiko postgres [local] COPY(main+0x340) [0x7a1f9c]
>
> It seems to forget to call TupleDescFinalize(). And I think we need
> some regression tests for this case.
>
> ---
> + if (cstate->opts.format == COPY_FORMAT_JSON)
> + {
> + /*
> + * If FORCE_ARRAY has been specified, send the opening bracket.
> + */
> + if (cstate->opts.force_array)
> + {
> + CopySendChar(cstate, '[');
> + CopySendTextLikeEndOfRow(cstate);
> + }
> + }
>
> We can conjunct the two if statement conditions.
>
>
> Here's a v30 set that I hope fixes these issues.

Thank you for updating the patch! The patches look good to me.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2026-03-17 19:31:43 Trying to understand pg_get_expr()
Previous Message Adrian Klaver 2026-03-17 17:43:37 Re: Can I use pg_dump to save a sequence for a table that is not also being saved?

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2026-03-17 17:51:15 Re: EXPLAIN: showing ReadStream / prefetch stats
Previous Message Robert Haas 2026-03-17 17:44:06 Re: pg_plan_advice