Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Mark Wong <markwkm(at)gmail(dot)com>
To: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-01-30 22:05:33
Message-ID: aX0rLfLf3aFO4cl-@o
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 13, 2026 at 06:20:27PM -0600, Manni Wood wrote:
> On Tue, Jan 13, 2026 at 1:12 PM Mark Wong <markwkm(at)gmail(dot)com> wrote:
>
> > On Fri, Jan 09, 2026 at 05:21:45PM +0300, Nazir Bilal Yavuz wrote:
> > > Were you able to understand why Mark's benchmark results are different
> > > from ours?
> >
> > Not yet... I had some guesses, which is why I suggested the processor
> > pinning
> > and using a ramdisk. But we haven't tried applying all of those to my
> > laptop,
> > which has 3 core types, or the POWER system, which may be interesting to
> > use a
> > ram disk on.
> >
> > I'm curious though, and admittedly haven't tried looking myself yet, about
> > how
> > the SIMD calls might look across different processor architectures. We'll
> > try
> > to get that on the POWER system soon...
> >
> > Regards,
> > Mark
>
> Hello!
>
> Nazir, I'm glad you are finding the benchmarks useful. I have more! :-)
>
> All of these benchmarks are all-in-RAM, because I do think that is the best
> way of getting closest to the theoretical best and worst case scenarios.
>
> My laptop:
>
> master: (852558b9)
>
> text, no special: 14996
> text, 1/3 special: 17270
> csv, no special: 18274
> csv, 1/3 special: 23852
>
> v3
>
> text, no special: 11282 (24.7% speedup)
> text, 1/3 special: 15748 (8.8% speedup) <-- I don't believe this but it's
> what I got
> csv, no special: 11571 (36.6% speedup)
> csv, 1/3 special: 19934 (16.4% speedup) <-- I don't believe this but it's
> what I got
>
> v4.2
>
> text, no special: 11139 (25.7% speedup)
> text, 1/3 special: 18900 (9.4% regression)
> csv, no special: 11490 (37.1% speedup)
> csv, 1/3 special: 26134 (9.5% regression)
>
> An AWS EC2 t2.2xlarge instance
>
> master: (852558b9)
>
> text, no special: 20677
> text, 1/3 special: 22660
> csv, no special: 24534
> csv, 1/3 special: 30999
>
> v3
>
> text, no special: 17534 (15.2% speedup)
> text, 1/3 special: 22816 (0.6% regression)
> csv, no special: 17664 (28.0% speedup)
> csv, 1/3 special: 29338 (5.3% speedup) <-- I don't believe this but it's
> what I got
>
> v4.2
>
> text, no special: 17459 (15.5% speedup)
> text, 1/3 special: 25051 (10.5% regression)
> csv, no special: 17574 (28.3% speedup)
> csv, 1/3 special: 32092 (3.5% regression)
>
> An AWS EC2 t4g.2xlarge instance (using ARM processor; first test of ARM
> processor!)
>
> master: (852558b9)
>
> text, no special: 22081
> text, 1/3 special: 25100
> csv, no special: 27296
> csv, 1/3 special: 32344
>
> v3
>
> text, no special: 17724 (19.7% speedup)
> text, 1/3 special: 27606 (9.9% regression) <-- yikes! We would want to test
> this more
> csv, no special: 17597 (35.5% speedup)
> csv, 1/3 special: 32597 (0.8% regression)
>
> v4.2
>
> text, no special: 17674 (20% speedup)
> text, 1/3 special: 25773 (2.6% regression) <-- this regression is less than
> for the v3 patch? Atypical...
> csv, no special: 17651 (35.3% speedup)
> csv, 1/3 special: 34055 (5.3% regression)

I'm still lagging behind a little I ran the v4.2 patches again applied to
71c11369 on the POWER system that I have access to, using Manni's copysimdperf
scripts to use a ramdisk and processor pinning.

text, no special: -2508 (30% speedup)
text, 1/3 special: -1753 (48% speedup)
csv, no special: 9264 (3% regression)
csv, 1/3 special: -4077 (0.3% speedup)

Using Manni's script makes me feel more confident about executing the tests in
the same way, so I don't know how concerning the difference in results are
compared to the other architectures.

> Yes, I think I agree with you that the everything-in-RAM benchmarks will
> make the regressions more pronounced, just like the everything-in-RAM
> benchmarks make the improvements more pronounced.
>
> I am not sure why the CSV regression, compared to the TXT regression (even
> for the v3 patch which has smaller regressions than the v4.2 patch) is
> usually worse. I probably should look over some flame graphs and see if I
> can find the place where the CSV-parsing code is so much slower. The CSV
> regression is actually a bit frustrating (at around 5%) because the TXT
> regression, at less than 1% (for the v3 patch) is so much easier to bare.
>
> Here are some copy-to benchmarks for the v4 patch that applies SIMD to the
> copy-to code.
>
> These were all-in-RAM tests.
>
> My laptop
>
> master: (852558b9)
>
> text, no special: 2948
> text, 1/3 special: 11258
> csv, no special: 6245
> csv, 1/3 special: 11258
>
> v4 (copy to)
>
> text, no special: 2126 (27.9% speedup)
> text, 1/3 special: 12080 (7.3% regression) <-- did not see such a big
> regression before
> csv, no special: 2432 (61.0% speedup)
> csv, 1/3 special: 12344 (4.0% regression) <-- did not see such a big
> regression before
>
> An AWS EC2 t2.2xlarge instance
>
> master: (852558b9)
>
> text, no special: 4647
> text, 1/3 special: 13865
> csv, no special: 5421
> csv, 1/3 special: 15284
>
> v4 (copy to)
>
> text, no special: 2460 (47.0% speedup)
> text, 1/3 special: 14023 (1.1% regression)
> csv, no special: 2667 (50.7% speedup)
> csv, 1/3 special: 15251 (0.2% speedup)
>
> An AWS EC2 t4g.2xlarge instance (using ARM processor; first test of ARM
> processor!)
>
> master: (852558b9)
>
> text, no special: 6951
> text, 1/3 special: 17857
> csv, no special: 7951
> csv, 1/3 special: 18504
>
> v4 (copy to)
>
> text, no special: 3372 (51.4% speedup)
> text, 1/3 special: 15713 (12.0% speedup)
> csv, no special: 3233 (59.3% speedup)
> csv, 1/3 special: 1622 (12.3% speedup)
>
> Once again, the v4 patch for copy-to seems like a clearer win, though, to
> be fair, there were regressions when running on my laptop. (I'm starting to
> think servers or desktops are better than laptops for testing these things,
> though maybe that's my bias: it just seems like the server results are
> always less surprising.)
>
> Hope you all continue to find these useful...

Regards,
Mark

--
Mark Wong <markwkm(at)gmail(dot)com>
EDB https://enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2026-01-31 00:18:02 Re: index prefetching
Previous Message Melanie Plageman 2026-01-30 21:36:23 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)