Re: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: Re: POC: postgres_fdw insert batching
Date: 2020-06-30 16:53:37
Message-ID: 20200630165337.zzoup4v7mgsegn26@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 29, 2020 at 04:22:15PM +0530, Ashutosh Bapat wrote:
>On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>>
>> FDW batching: 4584 ms
>>
>> So, rather nice improvement, I'd say ...
>
>Very nice.
>
>>
>> Before I spend more time hacking on this, I have a couple open questions
>> about the design, restrictions etc.
>>
>>
>> 1) Extend the FDW API?
>>
>> In the patch, the batching is simply "injected" into the existing insert
>> API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
>> extend the API with a "batched" version of the method, so that we can
>> easily determine whether the FDW supports batching or not - it would
>> require changes in the callers, though. OTOH it might be useful for
>> COPY, where we could do something similar to multi_insert (COPY already
>> benefits from this patch, but it does not use the batching built-into
>> COPY).
>
>Amit Langote has pointed out a related patch being discussed on hackers at [1].
>
>That patch introduces a new API. But if we can do it without
>introducing a new API that will be good. FDWs which can support
>batching can just modify their code and don't have to implement and
>manage a new API. We already have a handful of those APIs.
>

I don't think extending the API is a big issue - the FDW code will need
changing anyway, so this seems minor.

I'll take a look at the COPY patch - I agree it seems like a good idea,
although it can be less convenient in various caes (e.g. I've seen a lot
of INSERT ... SELECT queries in sharded systems, etc.).

>>
>> 2) What about the insert results?
>>
>> I'm not sure what to do about "result" status for the inserted rows. We
>> only really "stash" the rows into a buffer, so we don't know if it will
>> succeed or not. The patch simply assumes it will succeed, but that's
>> clearly wrong, and it may result in reporting a wrong number or rows.
>
>I didn't get this. We are executing an INSERT on the foreign server,
>so we get the number of rows INSERTed from that server. We should just
>add those up across batches. If there's a failure, it would abort the
>transaction, local as well as remote.
>

True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.

>>
>> The patch also disables the batching when the insert has a RETURNING
>> clause, because there's just a single slot (for the currently inserted
>> row). I suppose a "batching" method would take an array of slots.
>>
>
>It will be a rare case when a bulk load also has a RETURNING clause.
>So, we can leave with this restriction. We should try to choose a
>design which allows that restriction to be lifted in the future. But I
>doubt that restriction will be a serious one.
>
>>
>> 3) What about the other DML operations (DELETE/UPDATE)?
>>
>> The other DML operations could probably benefit from the batching too.
>> INSERT was good enough for a PoC, but having batching only for INSERT
>> seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
>> of quals, but likely doable.
>
>Bulk INSERTs are more common in a sharded environment because of data
>load in say OLAP systems. Bulk update/delete are rare, although not
>that rare. So if an approach just supports bulk insert and not bulk
>UPDATE/DELETE that will address a large number of usecases IMO. But if
>we can make everything work together that would be good as well.
>
>In your patch, I see that an INSERT statement with batch is
>constructed as INSERT INTO ... VALUES (...), (...) as many values as
>the batch size. That won't work as is for UPDATE/DELETE since we can't
>pass multiple pairs of ctids and columns to be updated for each ctid
>in one statement. Maybe we could build as many UPDATE/DELETE
>statements as the size of a batch, but that would be ugly. What we
>need is a feature like a batch prepared statement in libpq similar to
>what JDBC supports
>((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
>This will allow a single prepared statement to be executed with a
>batch of parameters, each batch corresponding to one foreign DML
>statement.
>

I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.

>>
>>
>> 3) Should we do batching for COPY insteads?
>>
>> While looking at multi_insert, I've realized it's mostly exactly what
>> the new "batching insert" API function would need to be. But it's only
>> really used in COPY, so I wonder if we should just abandon the idea of
>> batching INSERTs and do batching COPY for FDW tables.
>
>I think this won't support RETURNING as well. But if we could somehow
>use copy protocol to send the data to the foreign server and yet treat
>it as INSERT, that might work. I think we have find out which performs
>better COPY or batch INSERT.
>

I don't see why not support both, the use cases are somewhat different I
think.

>>
>> For cases that can replace INSERT with COPY this would be enough, but
>> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
>> this :-(
>
>Agreed, if we want to support bulk UPDATE/DELETE as well.
>
>>
>>
>> 4) Expected consistency?
>>
>> I'm not entirely sure what are the consistency expectations for FDWs.
>> Currently the FDW nodes pointing to the same server share a connection,
>> so the inserted rows might be visible to other nodes. But if we only
>> stash the rows in a local buffer for a while, that's no longer true. So
>> maybe this breaks the consistency expectations?
>>
>> But maybe that's OK - I'm not sure how the prepared statements/cursors
>> affect this. I can imagine restricting the batching only to plans where
>> this is not an issue (single FDW node or something), but it seems rather
>> fragile and undesirable.
>
>I think that area is grey. Depending upon where the cursor is
>positioned when a DML node executes a query, the data fetched from
>cursor may or may not see the effect of DML. The cursor position is
>based on the batch size so we already have problems in this area I
>think. Assuming that the DML and SELECT are independent this will
>work. So, the consistency problems exists, it will just be modulated
>by batching DML. I doubt that's related to this feature exclusively
>and should be solved independent of this feature.
>

OK, thanks for the feedback.

>>
>> I was thinking about adding a GUC to enable/disable the batching at some
>> level (global, server, table, ...) but it seems like a bad match because
>> the consistency expectations likely depend on a query. There should be a
>> GUC to set the batch size, though (it's hardcoded to 100 for now).
>>
>
>Similar to fetch_size, it should foreign server, table level setting, IMO.
>
>[1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
>

Yeah, I agree we should have a GUC to define the batch size. What I had
in mind was something that would allow us to enable/disable batching to
increase the consistency guarantees, or something like that. I think
simple GUCs are a poor solution for that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-30 17:13:43 Re: estimation problems for DISTINCT ON with FDW
Previous Message Tom Lane 2020-06-30 16:24:11 Intermittent BRIN failures on hyrax and lousyjack