Re: POC: postgres_fdw insert batching

From: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: Re: POC: postgres_fdw insert batching
Date: 2020-06-30 04:22:44
Message-ID: CAG-ACPW2d-PUTvkHh3=qBdYmJLfj+oYAp6UyV_rNA4LPycHLfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:

> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> > > 3) What about the other DML operations (DELETE/UPDATE)?
> > >
> > > The other DML operations could probably benefit from the batching too.
> > > INSERT was good enough for a PoC, but having batching only for INSERT
> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> > > of quals, but likely doable.
> >
> > Bulk INSERTs are more common in a sharded environment because of data
> > load in say OLAP systems. Bulk update/delete are rare, although not
> > that rare. So if an approach just supports bulk insert and not bulk
> > UPDATE/DELETE that will address a large number of usecases IMO. But if
> > we can make everything work together that would be good as well.
>
> In most cases, I think the entire UPDATE/DELETE operations would be
> pushed down to the remote side by DirectModify. So, I'm not sure we
> really need the bulk UPDATE/DELETE.
>

That may not be true for a partitioned table whose partitions are foreign
tables. Esp. given the work that Amit Langote is doing [1]. It really
depends on the ability of postgres_fdw to detect that the DML modifying
each of the partitions can be pushed down. That may not come easily.

>
> > > 3) Should we do batching for COPY insteads?
> > >
> > > While looking at multi_insert, I've realized it's mostly exactly what
> > > the new "batching insert" API function would need to be. But it's only
> > > really used in COPY, so I wonder if we should just abandon the idea of
> > > batching INSERTs and do batching COPY for FDW tables.
>
> > I think we have find out which performs
> > better COPY or batch INSERT.
>
> Maybe I'm missing something, but I think the COPY patch [1] seems more
> promising to me, because 1) it would not get the remote side's planner
> and executor involved, and 2) the data would be loaded more
> efficiently by multi-insert on the demote side. (Yeah, COPY doesn't
> support RETURNING, but it's rare that RETURNING is needed in a bulk
> load, as you mentioned.)
>
> > [1]
> https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
>
> Best regards,
> Etsuro Fujita
>

[1]
https://www.postgresql.org/message-id/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
--
Best Wishes,
Ashutosh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-06-30 04:24:00 Re: Use of "long" in incremental sort code
Previous Message Tom Lane 2020-06-30 04:20:00 Re: Use of "long" in incremental sort code