Re: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, 'Craig Ringer' <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: POC: postgres_fdw insert batching
Date: 2020-11-26 19:34:04
Message-ID: 3a3c6a5c-2440-9ea0-b0f7-5b4d282716e7@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/26/20 2:48 AM, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
> From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
>> Well, good that we all agree this is a useful feature to have (in
>> general). The question is whether postgres_fdw should be doing
>> batching on it's onw (per this thread) or rely on some other
>> feature (libpq pipelining). I haven't followed the other thread,
>> so I don't have an opinion on that.
>
> Well, as someone said in this thread, I think bulk insert is much
> more common than updates/deletes. Thus, major DBMSs have INSERT
> VALUES(record1), (record2)... and INSERT SELECT. Oracle has direct
> path INSERT in addition. As for the comparison of INSERT with
> multiple records and libpq batching (= multiple INSERTs), I think
> the former is more efficient because the amount of data transfer is
> less and the parsing-planning of INSERT for each record is
> eliminated.
>
> I never deny the usefulness of libpq batch/pipelining, but I'm not
> sure if app developers would really use it. If they want to reduce
> the client-server round-trips, won't they use traditional stored
> procedures? Yes, the stored procedure language is very
> DBMS-specific. Then, I'd like to know what kind of well-known
> applications are using standard batching API like JDBC's batch
> updates. (Sorry, I think that should be discussed in libpq
> batch/pipelining thread and this thread should not be polluted.)
>

Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

>
>> Note however we're doing two things here, actually - we're
>> implementing custom batching for postgres_fdw, but we're also
>> extending the FDW API to allow other implementations do the same
>> thing. And most of them won't be able to rely on the connection
>> library providing that, I believe.
>
> I'm afraid so, too. Then, postgres_fdw would be an example that
> other FDW developers would look at when they use INSERT with
> multiple records.
>

Well, my point was that we could keep the API, but maybe it should be
implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-11-26 19:56:15 Re: remove spurious CREATE INDEX CONCURRENTLY wait
Previous Message Tomas Vondra 2020-11-26 19:27:14 Re: [PoC] Non-volatile WAL buffer