From: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
---|---|
To: | Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [POC] Fast COPY FROM command for the table with foreign partitions |
Date: | 2020-06-19 14:58:02 |
Message-ID: | CAPmGK16_dL+JiyqFYseLPWkfox5CTQ0+QRyxf7BvokB0HJx6xw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 2, 2020 at 2:51 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> 02.06.2020 05:02, Etsuro Fujita пишет:
> > I think I also thought something similar to this before [1]. Will take a look.
I'm still reviewing the patch, but let me comment on it.
> > [1] https://www.postgresql.org/message-id/23990375-45a6-5823-b0aa-a6a7a6a957f0%40lab.ntt.co.jp
> I have looked into the thread.
> My first version of the patch was like your idea. But when developing
> the “COPY FROM” code, the following features were discovered:
> 1. Two or more partitions can be placed at the same node. We need to
> finish COPY into one partition before start COPY into another partition
> at the same node.
> 2. On any error we need to send EOF to all started "COPY .. FROM STDIN"
> operations. Otherwise FDW can't cancel operation.
>
> Hiding the COPY code under the buffers management machinery allows us to
> generalize buffers machinery, execute one COPY operation on each buffer
> and simplify error handling.
I'm not sure that it's really a good idea that the bulk-insert API is
designed the way it's tightly coupled with the bulk-insert machinery
in the core, because 1) some FDWs might want to send tuples provided
by the core to the remote, one by one, without storing them in a
buffer, or 2) some other FDWs might want to store the tuples in the
buffer and send them in a lump as postgres_fdw in the proposed patch
but might want to do so independently of MAX_BUFFERED_TUPLES and/or
MAX_BUFFERED_BYTES defined in the bulk-insert machinery.
I agree that we would need special handling for cases you mentioned
above if we design this API based on something like the idea I
proposed in that thread.
> As i understand, main idea of the thread, mentioned by you, is to add
> "COPY FROM" support without changes in FDW API.
I don't think so; I think we should introduce new API for this feature
to keep the ExecForeignInsert() API simple.
> All that I can offer in this place now is to introduce one new
> ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM
> STDIN" operation, send tuples and close the operation. We can use the
> ExecForeignInsert() routine for each buffer tuple if
> ExecForeignBulkInsert() is not supported.
Agreed.
> One of main questions here is to use COPY TO machinery for serializing a
> tuple. It is needed (if you will take a look into the patch) to
> transform the CopyTo() routine to an iterative representation:
> start/next/finish. May it be acceptable?
+1 for the general idea.
> In the attachment there is a patch with the correction of a stupid error.
Thanks for the patch!
Sorry for the delay.
Best regards,
Etsuro Fujita
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-06-19 15:41:27 | Re: More tzdb fun: POSIXRULES is being deprecated upstream |
Previous Message | Alexey Kondratov | 2020-06-19 14:20:11 | Re: [PATCH] Allow to specify restart_lsn in pg_create_physical_replication_slot() |