Re: Emitting JSON to file using COPY TO

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Florents Tselai <florents(dot)tselai(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2026-03-16 20:59:56
Message-ID: 964a3377-d957-4a06-ad02-8228aefa65fc@dunslane.net
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


On 2026-03-16 Mo 2:24 PM, Masahiko Sawada wrote:
> On Sun, Mar 8, 2026 at 8:49 PM jian he<jian(dot)universality(at)gmail(dot)com> wrote:
>> On Mon, Mar 9, 2026 at 3:44 AM Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>> Hmm. But should we be scribbling on slot->tts_tupleDescriptor like that?
>>> How about something like this?:
>>>
>>> - * Full table or query without column list. Ensure the slot uses
>>> - * cstate->tupDesc so that the datum is stamped with the right type;
>>> - * for queries output type is RECORDOID this must be the blessed
>>> - * descriptor so that composite_to_json can look it up via
>>> - * lookup_rowtype_tupdesc.
>>> + * Full table or query without column list. For queries, the slot's
>>> + * TupleDesc may carry RECORDOID, which is not registered in the
>>> type
>>> + * cache and would cause composite_to_json's lookup_rowtype_tupdesc
>>> + * call to fail. Build a HeapTuple stamped with the blessed
>>> + * descriptor so the type can be looked up correctly.
>>> */
>>> if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid ==
>>> RECORDOID)
>>> - slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
>>> + {
>>> + HeapTuple tup;
>>>
>>> - rowdata = ExecFetchSlotHeapTupleDatum(slot);
>>> + tup = heap_form_tuple(cstate->tupDesc,
>>> + slot->tts_values,
>>> + slot->tts_isnull);
>>> + rowdata = HeapTupleGetDatum(tup);
>>> + }
>>> + else
>>> + {
>>> + rowdata = ExecFetchSlotHeapTupleDatum(slot);
>>> + }
>>>
>> This is better. I've tried to get rid of json_projvalues and json_projnulls.
>> Just using heap_form_tuple, but it won't work.
>>
>> I incorporated the v28-0004 COPY column list into v9-0002.
>> With this patch set, we added four fields to the struct CopyToStateData.
>>
>> + StringInfo json_buf; /* reusable buffer for JSON output,
>> + * initialized in BeginCopyTo */
>> + TupleDesc tupDesc; /* Descriptor for JSON output; for a column
>> + * list this is a projected descriptor */
>> + Datum *json_projvalues; /* pre-allocated projection values, or
>> + * NULL */
>> + bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
>>
>> Using the script in
>> https://www.postgresql.org/message-id/CACJufxFFZqxC3p4WjpTEi4riaJm%3DpADX%2Bpy0yQ0%3DRWTn5cqK3Q%40mail.gmail.com
>> I tested it again on macOS and Linux, and there are no regressions for
>> COPY TO with the TEXT and CSV formats.
> I've reviewed the patch and have some comments:
>
> ---
> I got a SEGV in the following scenario:
>
> postgres(1:1197708)=# create table test (a int, b text, c jsonb);
> CREATE TABLE
> postgres(1:1197708)=# copy test(a, b) to stdout with (format 'json' );
> TRAP: failed Assert("tupdesc->firstNonCachedOffsetAttr >= 0"), File:
> "execTuples.c", Line: 2328, PID: 1197708
> postgres: masahiko postgres [local] COPY(ExceptionalCondition+0x9e) [0xbebe48]
> postgres: masahiko postgres [local] COPY(BlessTupleDesc+0x2b) [0x729b50]
> postgres: masahiko postgres [local] COPY(BeginCopyTo+0xc94) [0x637bdf]
> postgres: masahiko postgres [local] COPY(DoCopy+0xb68) [0x62afbc]
> postgres: masahiko postgres [local]
> COPY(standard_ProcessUtility+0xa22) [0xa0ba48]
> postgres: masahiko postgres [local] COPY(ProcessUtility+0x10e) [0xa0b01f]
> postgres: masahiko postgres [local] COPY() [0xa09872]
> postgres: masahiko postgres [local] COPY() [0xa09acf]
> postgres: masahiko postgres [local] COPY(PortalRun+0x2c8) [0xa0901d]
> postgres: masahiko postgres [local] COPY() [0xa02055]
> postgres: masahiko postgres [local] COPY(PostgresMain+0xaf1) [0xa0724e]
> postgres: masahiko postgres [local] COPY() [0x9fdab9]
> postgres: masahiko postgres [local]
> COPY(postmaster_child_launch+0x165) [0x905378]
> postgres: masahiko postgres [local] COPY() [0x90b600]
> postgres: masahiko postgres [local] COPY() [0x908e6a]
> postgres: masahiko postgres [local] COPY(PostmasterMain+0x14fe) [0x90880c]
> postgres: masahiko postgres [local] COPY(main+0x340) [0x7a1f9c]
>
> It seems to forget to call TupleDescFinalize(). And I think we need
> some regression tests for this case.
>
> ---
> + if (cstate->opts.format == COPY_FORMAT_JSON)
> + {
> + /*
> + * If FORCE_ARRAY has been specified, send the opening bracket.
> + */
> + if (cstate->opts.force_array)
> + {
> + CopySendChar(cstate, '[');
> + CopySendTextLikeEndOfRow(cstate);
> + }
> + }
>
> We can conjunct the two if statement conditions.
>

Here's a v30 set that I hope fixes these issues.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Attachment Content-Type Size
v30-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch text/x-patch 13.1 KB
v30-0002-json-format-for-COPY-TO.patch text/x-patch 29.5 KB
v30-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch text/x-patch 12.5 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2026-03-16 21:27:05 Re: Does included columns part of the PK
Previous Message Igor Korot 2026-03-16 18:43:13 Re: Does included columns part of the PK

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2026-03-16 21:01:22 Re: Read-only connection mode for AI workflows.
Previous Message Daniil Davydov 2026-03-16 20:54:43 Re: POC: Parallel processing of indexes in autovacuum