Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-05-26 01:27:20
Message-ID: 20250526.102720.102802336158980899.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoAY_h-9nuhs14e3cyO_A2rH7==zuq+NPHkn9ggwyaXnPQ(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 May 2025 21:29:23 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

>> > So the idea is that the backend process sets the format ID somewhere
>> > in st_progress_param, and then the progress view calls a SQL function,
>> > say pg_stat_get_copy_format_name(), with the format ID that returns
>> > the corresponding format name.
>>
>> Does it work when we use session_preload_libraries or the
>> LOAD command? If we have 2 sessions and both of them load
>> "jsonlines" COPY FORMAT extensions, what will be happened?
>>
>> For example:
>>
>> 1. Session 1: Register "jsonlines"
>> 2. Session 2: Register "jsonlines"
>> (Should global format ID <-> format name mapping
>> be updated?)
>> 3. Session 2: Close this session.
>> Unregister "jsonlines".
>> (Can we unregister COPY FORMAT extension?)
>> (Should global format ID <-> format name mapping
>> be updated?)
>> 4. Session 1: Close this session.
>> Unregister "jsonlines".
>> (Can we unregister COPY FORMAT extension?)
>> (Should global format ID <-> format name mapping
>> be updated?)
>
> I imagine that only for progress reporting purposes, I think session 1
> and 2 can have different format IDs for the same 'jsonlines' if they
> load it by LOAD command. They can advertise the format IDs on the
> shmem and we can also provide a SQL function for the progress view
> that can get the format name by the format ID.
>
> Considering the possibility that we might want to use the format ID
> also in the cumulative statistics, we might want to strictly provide
> the unique format ID for each custom format as the format IDs are
> serialized to the pgstat file. One possible way to implement it is
> that we manage the custom format IDs in a wiki page like we do for
> custom cumulative statistics and custom RMGR[1][2]. That is, a custom
> format extension registers the format name along with the format ID
> that is pre-registered in the wiki page or the format ID (e.g. 128)
> indicating under development. If either the format name or format ID
> conflict with an already registered custom format extension, the
> registration function raises an error. And we preallocate enough
> format IDs for built-in formats.
>
> As for unregistration, I think that even if we provide an
> unregisteration API, it ultimately depends on whether or not custom
> format extensions call it in _PG_fini().

Thanks for sharing your idea.

With the former ID issuing approach, it seems that we need a
global format ID <-> name mapping and a per session
registered format name list. The custom COPY FORMAT register
function rejects the same format name, right? If we support
both of shared_preload_libraries and
session_preload_libraries/LOAD, we have different life time
custom formats. It may introduce a complexity with the ID
issuing approach.

With the latter static ID approach, how to implement a
function that converts format ID to format name? PostgreSQL
itself doesn't know ID <-> name mapping in the Wiki page. It
seems that custom COPY FORMAT implementation needs to
register its name to PostgreSQL by itself.

Thanks,
--
kou

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message wenhui qiu 2025-05-26 02:17:25 Re: Automatically sizing the IO worker pool
Previous Message Sutou Kouhei 2025-05-26 01:04:05 Re: Make COPY format extendable: Extract COPY TO format implementations