Re: Speed up COPY FROM text/CSV parsing using SIMD

From: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
To: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-02-04 14:28:56
Message-ID: CA+K2RunttnPQShNKcz5xH_4UaTM=Lomxv1EJ8RJyZsKsxmvaWA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

On Wed, Feb 4, 2026, 6:38 AM Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> wrote:

> The 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch seems nice!
> On My x86 PC, it had the usual performance improvment of earlier patches,
> but the regression seemed more similar for both text and csv inputs.
> Unfortunately, the regression is about 2.5%, but maybe that is an
> acceptable worst-case for an improvement of 22% for text inputs and 33% for
> CSV inputs?
>
> The 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch looks even
> better on my Raspberry Pi's arm processor: not only do we see a 22%
> improvement for text and an almost 34% improvement for CSV, even the
> worst-case scenarios show an almost 4% improvement for text and an 11.7%
> improvement for CSV.
>
> By comparison,
> the v5.1-0001-Simple-heuristic-for-SIMD-COPY-FROM.patch.patch's worst-case
> performance is poorer on both architectures.
>
> I'd be curious to know if anyone else can reproduces these
> numbers. 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch seems
> like a real winner.
>
Thanks for the benchmark Manni, i suppose this is the same threshold as
patch has (4096 bytes), have you tried any bigger values for the threshold
?
Because i'm still expecting less l1d cache misses and execution times the
more we increase the threshold (relatively to l1d cache size per core).
As per my previous not-so-stable numbers 28KB wasn't too bad.

Regards,
Ayoub

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-02-04 14:44:05 Re: Pasword expiration warning
Previous Message Andres Freund 2026-02-04 14:22:54 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?