Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-02-25 14:24:27
Message-ID: CAN55FZ3+NYF1TkKyNtpRQuLiaauSYk9G5tA+fpruOA4-14Y_ZA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, 24 Feb 2026 at 20:48, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Tue, Feb 24, 2026 at 04:57:21PM +0300, Nazir Bilal Yavuz wrote:
> > I will investigate this. However, please note that the current master
> > includes the inlining commit (dc592a4155), which makes the COPY FROM
> > faster. In my case,
> >
> > 1: current master without dc592a4155: 14400ms
> > 2: current master: 13960ms (%3 improvement against #1)
> > 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> > regression against #2)
> >
> > Is it possible for you to do a similar test? I mean dropping
> > dc592a4155 from the current master and re-running the benchmark, that
> > would be helpful.
>
> IMHO as long as the difference from v18 looks reasonable, commit-by-commit
> regressions and improvements that even out in the end are okay. That's
> perhaps a bit of mental gymnastics (e.g., what if we had committed the
> inlining patch for v18?), but I believe that's how we've dealt with similar
> problems in the past. But maybe there are ways to avoid even these
> in-development regressions, too...

I agree with you. However, unfortunately, I see regression on master +
v10 compared to REL_18_3 (62d6c7d3df6).

Thank you Kazar and Manni for benchmarks in [1] and [2]!

I am still able to reproduce regression for the 'wide & CSV 1/3' case
[3] by using Manni's benchmark script. I constantly see ~%5
regression, I am just curious if I am doing something wrong. I am a
bit surprised because I didn't see this regression before, also Kazar
and Manni don't see any regression in their [1] and [2] benchmarks. I
am still investigating this regression. Hopefully, I will come back
with more information soon.

If anyone has any suggestions/ideas, please let me know!

[1] https://postgr.es/m/CA%2BK2RukFH57QPAfTEzvy7PEyrLzav3HkyCiu-2yqR%2BuW_Niorw%40mail.gmail.com
[2] https://postgr.es/m/CAKWEB6oT5KbyF%2BuRRhjjJi7p2PmRdOzxp3T6vFcN04BCR-%3DB2w%40mail.gmail.com
[3]
1: current master without dc592a4155: 14400ms
2: current master: 13960ms (%3 improvement against #1)
3: current master + v10: 15123ms (%5 regression against #1 and %8
regression against #2)

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-02-25 14:38:20 persevere NO INHERIT when Dump not-null constraints on inherited columns
Previous Message Srinath Reddy Sadipiralla 2026-02-25 13:55:42 Re: Adding REPACK [concurrently]