Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-02-24 13:57:21
Message-ID: CAN55FZ2O2Ls==sdpROHqxWRx-PMBZ0riJ6eVKoHj8=vssTavxw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, 24 Feb 2026 at 07:44, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> wrote:
>
> Hello!
>
> I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit surprised at the regression for x86 with wide rows for the 1/3rd special characters scenarios. I'm hoping it's something I did wrong. If anyone else has numbers to share, that would be excellent.

Thank you for doing this!

I see similar regression on the wide & CSV 1/3 case by using your
benchmark script. I didn't see this regression when I used my
benchmark while sharing v9 [1].

+-------------+---------------------------+---------------------------+
| | Text | CSV |
+-------------+-------------+-------------+-------------+-------------+
| WIDE TEST | None | 1/3 | None | 1/3 |
+-------------+-------------+-------------+-------------+-------------+
| Master | 9996 | 10769 | 11548 | 13960 |
+-------------+-------------+-------------+-------------+-------------+
| v10 | 8912 %-10.8 | 10902 %+1.2 | 8952 %-22.4 | 15123 %+8.3 |
+-------------+-------------+-------------+-------------+-------------+
| | | | | |
+-------------+-------------+-------------+-------------+-------------+
| | Text | | CSV |
+-------------+-------------+-------------+-------------+-------------+
| NARROW TEST | None | 1/3 | None | 1/3 |
+-------------+-------------+-------------+-------------+-------------+
| Master | 9441 | 9561 | 9734 | 9830 |
+-------------+-------------+-------------+-------------+-------------+
| v10 | 9291 %-1.5 | 9504 -%0.5 | 9644 %-0.9 | 10078 %-2.4 |
+-------------+-------------+-------------+-------------+-------------+

I will investigate this. However, please note that the current master
includes the inlining commit (dc592a4155), which makes the COPY FROM
faster. In my case,

1: current master without dc592a4155: 14400ms
2: current master: 13960ms (%3 improvement against #1)
3: current master + SIMD: 15123ms (%5 regression against #1 and %8
regression against #2)

Is it possible for you to do a similar test? I mean dropping
dc592a4155 from the current master and re-running the benchmark, that
would be helpful.

[1] https://postgr.es/m/CAN55FZ0MiFCgK26gRgE05a%3D_ggenkxDM8H%3DA2uTHpywczqt%3D-Q%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2026-02-24 14:04:18 Re: pg_stat_io_histogram
Previous Message Bertrand Drouvot 2026-02-24 13:55:48 Re: Flush some statistics within running transactions