Re: [POC] Fast COPY FROM command for the table with foreign partitions

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions
Date: 2020-07-29 08:03:02
Message-ID: CA+HiwqHRqG8X9W6-v-E6d0nw1k6aC7QSMk-SOvv4puOjWsQMmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andrey,

Thanks for updating the patch. I will try to take a look later.

On Wed, Jul 22, 2020 at 6:09 PM Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 7/16/20 2:14 PM, Amit Langote wrote:
> > * Why the "In" in these API names?
> >
> > + /* COPY a bulk of tuples into a foreign relation */
> > + BeginForeignCopyIn_function BeginForeignCopyIn;
> > + EndForeignCopyIn_function EndForeignCopyIn;
> > + ExecForeignCopyIn_function ExecForeignCopyIn;
>
> I used an analogy from copy.c.

Hmm, if we were going to also need *ForeignCopyOut APIs, maybe it
makes sense to have "In" here, but maybe we don't, so how about
leaving out the "In" for clarity?

> > * I see that the remote copy is performed from scratch on every call
> > of postgresExecForeignCopyIn(), but wouldn't it be more efficient to
> > send the `COPY remote_table FROM STDIN` in
> > postgresBeginForeignCopyIn() and end it in postgresEndForeignCopyIn()
> > when there are no errors during the copy?
>
> It is not possible. FDW share one connection between all foreign
> relations from a server. If two or more partitions will be placed at one
> foreign server you will have problems with concurrent COPY command.

Ah, you're right. I didn't consider multiple foreign partitions
pointing to the same server. Indeed, we would need separate
connections to a given server to COPY to multiple remote relations on
that server in parallel.

> May be we can create new connection for each partition?

Yeah, perhaps, although it sounds like something that might be more
generally useful and so we should work on that separately if at all.

> > I tried implementing these two changes -- pgfdw_copy_data_dest_cb()
> > and sending `COPY remote_table FROM STDIN` only once instead of on
> > every flush -- and I see significant speedup. Please check the
> > attached patch that applies on top of yours.
>
> I integrated first change and rejected the second by the reason as above.

Thanks.

Will send more comments after reading the v5 patch.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-07-29 08:32:53 Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
Previous Message Konstantin Knizhnik 2020-07-29 07:54:45 Re: [Patch] Optimize dropping of relation buffers using dlist