Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-21 09:35:04
Message-ID: 20231221.183504.1240642084042888377.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 23:31:29 +0900,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> I've sketched the above idea including a test module in
> src/test/module/test_copy_format, based on v2 patch. It's not splitted
> and is dirty so just for discussion.

I implemented a sample COPY TO handler for Apache Arrow that
supports only integer and text.

I needed to extend the patch:

1. Add an opaque space for custom COPY TO handler
* Add CopyToState{Get,Set}Opaque()
https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944

2. Export CopyToState::attnumlist
* Add CopyToStateGetAttNumList()
https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688

3. Export CopySend*()
* Rename CopySend*() to CopyToStateSend*() and export them
* Exception: CopySendEndOfRow() to CopyToStateFlush() because
it just flushes the internal buffer now.
https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e

The attached patch is based on the Sawada-san's patch and
includes the above changes. Note that this patch is also
dirty so just for discussion.

My suggestions from this experience:

1. Split COPY handler to COPY TO handler and COPY FROM handler

* CopyFormatRoutine is a bit tricky. An extension needs
to create a CopyFormatRoutine node and
a CopyToFormatRoutine node.

* If we just require "copy_to_${FORMAT}(internal)"
function and "copy_from_${FORMAT}(internal)" function,
we can remove the tricky approach. And it also avoid
name collisions with other handler such as tablesample
handler.
See also:
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9

2. Need an opaque space like IndexScanDesc::opaque does

* A custom COPY TO handler needs to keep its data

3. Export CopySend*()

* If we like minimum API, we just need to export
CopySendData() and CopySendEndOfRow(). But
CopySend{String,Char,Int32,Int16}() will be convenient
custom COPY TO handlers. (A custom COPY TO handler for
Apache Arrow doesn't need them.)

Questions:

1. What value should be used for "format" in
PgMsg_CopyOutResponse message?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144

It's 1 for binary format and 0 for text/csv format.

Should we make it customizable by custom COPY TO handler?
If so, what value should be used for this?

2. Do we need more tries for design discussion for the first
implementation? If we need, what should we try?

Thanks,
--
kou

Attachment Content-Type Size
v2-custom_copy_format_draft.diff text/x-patch 36.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-12-21 09:35:47 Re: Autonomous transactions 2023, WIP
Previous Message shveta malik 2023-12-21 09:31:59 Re: Synchronizing slots from primary to standby