Re: [POC] Fast COPY FROM command for the table with foreign partitions

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions
Date: 2020-07-16 09:14:04
Message-ID: CA+HiwqH2ehGurL9bYUydXBxVUH6aHN5yXXxESAOs9O6-f_+Qeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andrey,

Thanks for this work. I have been reading through your patch and
here's a what I understand it does and how:

The patch aims to fix the restriction that COPYing into a foreign
table can't use multi-insert buffer mechanism effectively. That's
because copy.c currently uses the ExecForeignInsert() FDW API which
can be passed only 1 row at a time. postgres_fdw's implementation
issues an `INSERT INTO remote_table VALUES (...)` statement to the
remote side for each row, which is pretty inefficient for bulk loads.
The patch introduces a new FDW API ExecForeignCopyIn() that can
receive multiple rows and copy.c now calls it every time it flushes
the multi-insert buffer so that all the flushed rows can be sent to
the remote side in one go. postgres_fdw's now issues a `COPY
remote_table FROM STDIN` to the remote server and
postgresExecForeignCopyIn() funnels the tuples flushed by the local
copy to the server side waiting for tuples on the COPY protocol.

Here are some comments on the patch.

* Why the "In" in these API names?

+ /* COPY a bulk of tuples into a foreign relation */
+ BeginForeignCopyIn_function BeginForeignCopyIn;
+ EndForeignCopyIn_function EndForeignCopyIn;
+ ExecForeignCopyIn_function ExecForeignCopyIn;

* fdwhandler.sgml should be updated with the description of these new APIs.

* As far as I can tell, the following copy.h additions are for an FDW
to use copy.c to obtain an external representation (char string) to
send to the remote side of the individual rows that are passed to
ExecForeignCopyIn():

+typedef void (*copy_data_dest_cb) (void *outbuf, int len);
+extern CopyState BeginForeignCopyTo(Relation rel);
+extern char *NextForeignCopyRow(CopyState cstate, TupleTableSlot *slot);
+extern void EndForeignCopyTo(CopyState cstate);

So, an FDW's ExecForeignCopyIn() calls copy.c: NextForeignCopyRow()
which in turn calls copy.c: CopyOneRowTo() which fills
CopyState.fe_msgbuf. The data_dest_cb() callback that runs after
fe_msgbuf contains the full row simply copies it into a palloc'd char
buffer whose pointer is returned back to ExecForeignCopyIn(). I
wonder why not let FDWs implement the callback and pass it to copy.c
through BeginForeignCopyTo()? For example, you could implement a
pgfdw_copy_data_dest_cb() in postgres_fdw.c which gets a direct
pointer of fe_msgbuf to send it to the remote server.

Do you think all FDWs would want to use copy,c like above? If not,
maybe the above APIs are really postgres_fdw-specific? Anyway, adding
comments above the definitions of these functions would be helpful.

* I see that the remote copy is performed from scratch on every call
of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
send the `COPY remote_table FROM STDIN` in
postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
when there are no errors during the copy?

I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
and sending `COPY remote_table FROM STDIN` only once instead of on
every flush -- and I see significant speedup. Please check the
attached patch that applies on top of yours. One problem I spotted
when trying my patch but didn't spend much time debugging is that
local COPY cannot be interrupted by Ctrl+C anymore, but that should be
fixable by adjusting PG_TRY() blocks.

* ResultRelInfo.UseBulkModifying should be ri_usesBulkModify for consistency.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
pgfdw-copy-buffering-amit-suggests.patch application/x-patch 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-07-16 09:16:25 Re: Resetting spilled txn statistics in pg_stat_replication
Previous Message Peter Eisentraut 2020-07-16 08:58:58 Re: OpenSSL 3.0.0 compatibility