Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-24 05:49:36
Message-ID: 20240124.144936.67229716500876806.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've implemented custom COPY format feature based on the
current design discussion. See the attached patches for
details.

I also implemented a PoC COPY format handler for Apache
Arrow with this implementation and it worked.
https://github.com/kou/pg-copy-arrow

The patches implement not only custom COPY TO format feature
but also custom COPY FROM format feature.

0001-0004 is for COPY TO and 0005-0008 is for COPY FROM.

For COPY TO:

0001: This adds CopyToRoutine and use it for text/csv/binary
formats. No implementation change. This just move codes.

0002: This adds support for adding custom COPY TO format by
"CREATE FUNCTION ${FORMAT_NAME}". This uses the same
approach provided by Sawada-san[1] but this doesn't
introduce a wrapper CopyRoutine struct for
Copy{To,From}Routine. Because I noticed that a wrapper
CopyRoutine struct is needless. Copy handler can just return
CopyToRoutine or CopyFromRtouine because both of them have
NodeTag. We can distinct a returned struct by the NodeTag.

[1] https://www.postgresql.org/message-id/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA@mail.gmail.com

0003: This exports CopyToStateData. No implementation change
except CopyDest enum values. I changed COPY_ prefix to
COPY_DEST_ to avoid name conflict with CopySource enum
values. This just moves codes.

0004: This adds CopyToState::opaque and exports
CopySendEndOfRow(). CopySendEndOfRow() is renamed to
CopyToStateFlush().

For COPY FROM:

0005: Same as 0001 but for COPY FROM. This adds
CopyFromRoutine and use it for text/csv/binary formats. No
implementation change. This just move codes.

0006: Same as 0002 but for COPY FROM. This adds support for
adding custom COPY FROM format by "CREATE FUNCTION
${FORMAT_NAME}".

0007: Same as 0003 but for COPY FROM. This exports
CopyFromStateData. No implementation change except
CopySource enum values. I changed COPY_ prefix to
COPY_SOURCE_ to align CopyDest changes in 0003. This just
moves codes.

0008: Same as 0004 but for COPY FROM. This adds
CopyFromState::opaque and exports
CopyReadBinaryData(). CopyReadBinaryData() is renamed to
CopyFromStateRead().

Thanks,
--
kou

In <20240115(dot)152702(dot)2011620917962812379(dot)kou(at)clear-code(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 15 Jan 2024 15:27:02 +0900 (JST),
Sutou Kouhei <kou(at)clear-code(dot)com> wrote:

> Hi,
>
> If there are no more comments for the current design, I'll
> start implementing this feature with the following
> approaches for "Discussing" items:
>
>> 3.1 Should we use one function(internal) for COPY TO/FROM
>> handlers or two function(internal)s (one is for COPY TO
>> handler and another is for COPY FROM handler)?
>> [4]
>
> I'll choose "one function(internal) for COPY TO/FROM handlers".
>
>> 3.4 Should we export Copy{To,From}State? Or should we just
>> provide getters/setters to access Copy{To,From}State
>> internal?
>> [10]
>
> I'll export Copy{To,From}State.
>
>
> Thanks,
> --
> kou
>
> In <20240112(dot)144615(dot)157925223373344229(dot)kou(at)clear-code(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 12 Jan 2024 14:46:15 +0900 (JST),
> Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
>> Hi,
>>
>> Here is the current summary for a this discussion to make
>> COPY format extendable. It's for reaching consensus and
>> starting implementing the feature. (I'll start implementing
>> the feature once we reach consensus.) If you have any
>> opinion, please share it.
>>
>> Confirmed:
>>
>> 1.1 Making COPY format extendable will not reduce performance.
>> [1]
>>
>> Decisions:
>>
>> 2.1 Use separated handler for COPY TO and COPY FROM because
>> our COPY TO implementation (copyto.c) and COPY FROM
>> implementation (coypfrom.c) are separated.
>> [2]
>>
>> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
>> just use a function(internal) that returns a handler instead.
>> [3]
>>
>> 2.3 The implementation must include documentation.
>> [5]
>>
>> 2.4 The implementation must include test.
>> [6]
>>
>> 2.5 The implementation should be consist of small patches
>> for easy to review.
>> [6]
>>
>> 2.7 Copy{To,From}State must have a opaque space for
>> handlers.
>> [8]
>>
>> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
>> handlers.
>> [8]
>>
>> 2.9 Make "format" in PgMsg_CopyOutResponse message
>> extendable.
>> [9]
>>
>> 2.10 Make makeNode() call avoidable in function(internal)
>> that returns COPY TO/FROM handler.
>> [9]
>>
>> 2.11 Custom COPY TO/FROM handlers must be able to parse
>> their options.
>> [11]
>>
>> Discussing:
>>
>> 3.1 Should we use one function(internal) for COPY TO/FROM
>> handlers or two function(internal)s (one is for COPY TO
>> handler and another is for COPY FROM handler)?
>> [4]
>>
>> 3.2 If we use separated function(internal) for COPY TO/FROM
>> handlers, we need to define naming rule. For example,
>> <method_name>_to(internal) for COPY TO handler and
>> <method_name>_from(internal) for COPY FROM handler.
>> [4]
>>
>> 3.3 Should we use prefix or suffix for function(internal)
>> name to avoid name conflict with other handlers such as
>> tablesample handlers?
>> [7]
>>
>> 3.4 Should we export Copy{To,From}State? Or should we just
>> provide getters/setters to access Copy{To,From}State
>> internal?
>> [10]
>>
>>
>> [1] https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
>> [2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
>> [3] https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
>> [4] https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
>> [5] https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
>> [6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
>> [7] https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
>> [8] https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
>> [9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
>> [10] https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
>> [11] https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com
>>
>>
>> Thanks,
>> --
>> kou
>>
>>
>
>

Attachment Content-Type Size
v6-0001-Extract-COPY-TO-format-implementations.patch text/x-patch 23.8 KB
v6-0002-Add-support-for-adding-custom-COPY-TO-format.patch text/x-patch 17.7 KB
v6-0003-Export-CopyToStateData.patch text/x-patch 13.8 KB
v6-0004-Add-support-for-implementing-custom-COPY-TO-forma.patch text/x-patch 3.3 KB
v6-0005-Extract-COPY-FROM-format-implementations.patch text/x-patch 23.7 KB
v6-0006-Add-support-for-adding-custom-COPY-FROM-format.patch text/x-patch 7.0 KB
v6-0007-Export-CopyFromStateData.patch text/x-patch 17.0 KB
v6-0008-Add-support-for-implementing-custom-COPY-FROM-for.patch text/x-patch 4.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-01-24 05:53:44 Re: Synchronizing slots from primary to standby
Previous Message Amit Kapila 2024-01-24 05:43:31 Re: Synchronizing slots from primary to standby