From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: postgres_fdw: Use COPY to speed up batch inserts |
Date: | 2025-10-16 20:39:47 |
Message-ID: | af99c828-ef24-4355-92d5-b5f631cda9c9@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Matheus,
Thanks for the patch. Please add it to the next committfest (PG19-3) at
https://commitfest.postgresql.org/
so that we don't lose track of the patch.
On 10/15/25 17:02, Matheus Alcantara wrote:
> Hi all,
>
> Currently on postgres_fdw we use prepared statements to insert batches
> into foreign tables. Although this works fine for the most use cases the
> COPY command can also be used in some scenarios to speed up large batch
> inserts.
>
Makes sense.
> The attached patch implements this idea of using the COPY command for
> batch inserts on postgres_fdw foreign tables. I've performed some
> benchmarks using pgbench and the results seem good to consider this.
>
Thanks. The code looks sensible in general, I think. I'll have a couple
minor comments. It'd be good to also update the documentation, and add
some tests to postgres_fdw.sql, to exercise this new code.
The sgml docs (doc/src/sgml/postgres-fdw.sgml) mention batch_size, and
explain how it's related to the number of parameters in the INSERT
state. Which does not matter when using COPY under the hood, so this
should be amended/clarified in some way. It doesn't need to be
super-detailed, though.
A couple minor comments about the code:
1) buildCopySql
- Does this really need the "(FORMAT TEXT, DELIMITER ',')" part? AFAIK
no, if we use the default copy format in convert_slot_to_copy_text.
- Shouldn't we cache the COPY SQL, similarly to how we keep insert SQL?
Instead of rebuilding it over and over for every batch.
2) convert_slot_to_copy_text
- It's probably better to not hard-code the delimiters etc.
- I wonder if the formatting needs to do something more like what
copyto.c does (through CopyToTextOneRow and CopyAttributeOutText). Maybe
not, not sure.
3) execute_foreign_modify
- I think the new block of code is a bit confusing. It essentially does
something similar to the original code, but not entirely. I suggest we
move it to a new function, and call that from execute_foreign_modify.
- I agree it's probably better to do COPYBUFSIZ, or something like that
to send the data in smaller (but not tiny) chunks.
> I've performed the benchmark using different batch_size values to see
> when this optimization could be useful. The following results are the
> best tps of 3 runs.
>
> Command: pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres
>
> batch_size: 10
> master tps: 76.360406
> patch tps: 68.917109
>
> batch_size: 100
> master tps: 123.427731
> patch tps: 243.737055
>
> batch_size: 1000
> master tps: 132.500506
> patch tps: 239.295132
>
> It seems that using a batch_size greater than 100 we can have a
> considerable speed up for batch inserts.
>
I did a bunch of benchmarks too, and I see similar speedups (depending
on the batch and data size). Attached is the script I used to run this.
It runs COPY into a foreign table, pointing to a second instance on the
same machine. And it does that for a range of data sizes, batch sizes,
client counts and logged/unlogged table.
The attached PDF summarizes the results, comparing "copy" build (with
this patch) to "master". There's also two "resourceowner" builds, with
this patch [1] - that helps COPY in general a lot, and it improves the
case with batch_size=1000.
I think the results are good, at least with larger batch sizes. The
regressions with batch_size=10 on UNLOGGED table are not great, though.
I have results from another machine, and there it affects even LOGGED
table. I'm not sure if this inherent, or something the patch can fix.
In a way, it's not surprising - batching with tiny batches adds the
overhead without much benefit. I don't think that kills the patch.
> The attached patch uses the COPY command whenever we have a *numSlots >
> 1 but the tests show that maybe we should have a GUC to enable this?
>
I can imagine having a GUC for testing, but it's not strictly necessary.
> I also think that we can have a better patch by removing the duplicated
> code introduced on this first version, specially on the clean up phase,
> but I tried to keep things more simple on this initial phase to keep the
> review more easier and also just to test the idea.
>
> Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE)
> output for batch inserts that use the COPY to mention that we are
> sending the COPY command to the remote server. I guess so?
>
Good point. We definitely should not show SQL for INSERT, when we're
actually running a COPY.
regards
[1]
https://www.postgresql.org/message-id/84f20db7-7d57-4dc0-8144-7e38e0bbe75d%40vondra.me
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
fdw-copy-results.pdf | application/pdf | 66.5 KB |
scripts.tgz | application/x-compressed-tar | 992 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2025-10-16 20:39:59 | Re: Preserve index stats during ALTER TABLE ... TYPE ... |
Previous Message | Tom Lane | 2025-10-16 20:16:48 | Re: Optimize LISTEN/NOTIFY |