| From: | Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> |
|---|---|
| To: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
| Cc: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2026-02-04 05:38:27 |
| Message-ID: | CAKWEB6pdser1_ewZHyb+bK7QimFmAPiq8V+1kOsXSqRj0KSeHQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Feb 3, 2026 at 6:06 AM KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
> Hello,
>
> On Tue, Feb 3, 2026, 12:02 PM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> wrote:
>
>> Hi,
>>
>> * There are four patches in this thread, three of them COPY FROM
>> related: v3, v4 and v5. One of them is COPY TO related. So, I guess it
>> would be better to move COPY TO discussion to another thread.
>>
> If there's possible discussions concerning COPY TO, then yes because i
> don't find many concerns about COPY TO.
>
>> * I am planning to create a new version (v6) of the patch, which will
>> be basically v3 with one less branching (moving !in_quote check under
>> is_csv and else). So that it would be easy to review and follow the
>> thread for new reviewers.
>>
> Oh cool, looking forward to this.
>
> Regards,
> Ayoub
>
Hello!
I have run some more tests.
x86 tower PC
commit 78bf28e3bf504db0eea5e3bcb3c43e9908108480 (HEAD -> master,
origin/master, origin/HEAD)
Text, no special characters 30468.4045ms
CSV, no special characters 35351.084ms
Text, with 1/3 escapes 32722.48625ms
CSV, with 1/3 quotes 44044.87575ms
x86 tower PC
commit 78bf28e3bf504db0eea5e3bcb3c43e9908108480 (HEAD -> master,
origin/master, origin/HEAD)
plus
0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch
Text, no special characters 23538.6895ms 22.7439379% improvement
CSV, no special characters 23475.94ms 33.59202224% improvement
Text, with 1/3 escapes 33530.8395ms -2.470329558% regression
CSV, with 1/3 quotes 45268.28275ms -2.777637533 regression
x86 tower PC
commit 78bf28e3bf504db0eea5e3bcb3c43e9908108480 (HEAD -> master,
origin/master, origin/HEAD)
plus
v5.1-0001-Simple-heuristic-for-SIMD-COPY-FROM.patch.patch
Text, no special characters 22728.42475ms 25.40329852% improvement
CSV, no special characters 22777.7805ms 35.56695319% improvement
Text, with 1/3 escapes 34542.75625ms -5.562749683% regression
CSV, with 1/3 quotes 45793.0095 -3.96898327 regression
arm raspberry pi 5
commit 78bf28e3bf504db0eea5e3bcb3c43e9908108480 (HEAD -> master,
postgres/master, postgres/HEAD)
Text, no special characters 9476.60875ms
CSV, no special characters 11132.6405ms
Text, with 1/3 escapes 10765.8125ms
CSV, with 1/3 quotes 14055.28925ms
arm raspberry pi 5
commit 78bf28e3bf504db0eea5e3bcb3c43e9908108480 (HEAD -> master,
postgres/master, postgres/HEAD)
with
0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch
Text, no special characters 7380.328ms 22.12057926% improvement
CSV, no special characters 7349.53475ms 33.98210649% improvement
Text, with 1/3 escapes 10350.6385ms 3.856411209% improvement
CSV, with 1/3 quotes 12407.22725ms 11.72556445% improvement
arm raspberry pi 5
commit 78bf28e3bf504db0eea5e3bcb3c43e9908108480 (HEAD -> master,
postgres/master, postgres/HEAD)
with
v5.1-0001-Simple-heuristic-for-SIMD-COPY-FROM.patch.patch
Text, no special characters 7379.0375ms 22.134197% improvement
CSV, no special characters 7411.73225ms 33.42341154% improvement
Text, with 1/3 escapes 11288.465ms -4.854742733% regression
CSV, with 1/3 quotes 15281.3355ms -8.723023968% regression
The 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch seems nice!
On My x86 PC, it had the usual performance improvment of earlier patches,
but the regression seemed more similar for both text and csv inputs.
Unfortunately, the regression is about 2.5%, but maybe that is an
acceptable worst-case for an improvement of 22% for text inputs and 33% for
CSV inputs?
The 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch looks even
better on my Raspberry Pi's arm processor: not only do we see a 22%
improvement for text and an almost 34% improvement for CSV, even the
worst-case scenarios show an almost 4% improvement for text and an 11.7%
improvement for CSV.
By comparison,
the v5.1-0001-Simple-heuristic-for-SIMD-COPY-FROM.patch.patch's worst-case
performance is poorer on both architectures.
I'd be curious to know if anyone else can reproduces these
numbers. 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch seems
like a real winner.
Best,
-Manni
--
-- Manni Wood EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Smith | 2026-02-04 05:40:12 | Re: Use allocation macros in the logical replication code |
| Previous Message | Corey Huinker | 2026-02-04 05:14:06 | Re: Add expressions to pg_restore_extended_stats() |