Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-03-06 16:59:52
Message-ID: CAKWEB6r0CrN-a2P=2ey3EK7p1MxsbQx2C8=hpNGfxLxnRaX66Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using
the same build I've been using: meson with "debugoptimized", which
translates to "-g -O2" gcc flags.

x86 NARROW old master (18bcdb75)
TXT : 25909.060500 ms
CSV : 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes: 34541.704750 ms

x86 NARROW v10
TXT : 26416.331500 ms -1.957890% regression
CSV : 25318.727500 ms 10.018142% improvement
TXT with 1/3 escapes: 28608.007500 ms -2.928061% regression
CSV with 1/3 quotes: 32805.627750 ms 5.026032% improvement

x86 NARROW v11
TXT : 27212.945750 ms -5.032545% regression
CSV : 26985.971250 ms 4.092817% improvement
TXT with 1/3 escapes: 27216.510000 ms 2.078374% improvement
CSV with 1/3 quotes: 32817.267500 ms 4.992334% improvement

x86 WIDE old master (18bcdb75)
TXT : 28778.426500 ms
CSV : 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes: 47024.416000 ms

x86 WIDE v10
TXT : 23067.046750 ms 19.846046% improvement
CSV : 23259.092250 ms 34.797174% improvement
TXT with 1/3 escapes: 31796.098250 ms 1.989583% improvement
CSV with 1/3 quotes: 42925.792250 ms 8.715948% improvement

x86 WIDE v11
TXT : 22571.305750 ms 21.568659% improvement
CSV : 22711.524750 ms 36.332184% improvement
TXT with 1/3 escapes: 29236.453000 ms 9.879604% improvement
CSV with 1/3 quotes: 40022.110750 ms 14.890786% improvement

arm NARROW old master (18bcdb75)
TXT : 10997.568250 ms
CSV : 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes: 12559.385750 ms

arm NARROW v10
TXT : 10467.816750 ms 4.816988% improvement
CSV : 9986.288000 ms 7.513381% improvement
TXT with 1/3 escapes: 10323.173750 ms -0.234262% regression
CSV with 1/3 quotes: 11843.611750 ms 5.699116% improvement

arm NARROW v11
TXT : 10340.966250 ms 5.970429% improvement
CSV : 10224.399500 ms 5.308144% improvement
TXT with 1/3 escapes: 10438.216750 ms -1.351288% regression
CSV with 1/3 quotes: 11865.934000 ms 5.521383% improvement

arm WIDE old master (18bcdb75)
TXT : 11825.771250 ms
CSV : 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes: 17557.954500 ms

arm WIDE v10
TXT : 9064.959000 ms 23.345727% improvement
CSV : 9019.553250 ms 35.144134% improvement
TXT with 1/3 escapes: 12344.497250 ms 8.087402% improvement
CSV with 1/3 quotes: 15495.863750 ms 11.744482% improvement

arm WIDE v11
TXT : 9001.442250 ms 23.882831% improvement
CSV : 8940.928750 ms 35.709490% improvement
TXT with 1/3 escapes: 12049.668500 ms 10.282589% improvement
CSV with 1/3 quotes: 15277.843250 ms 12.986201% improvement

Best,

-Manni

On Thu, Mar 5, 2026 at 3:25 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

>
> On 2026-03-04 We 10:15 AM, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <nathandbossart(at)gmail(dot)com>
> wrote:
> >> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> >>> If anyone has any suggestions/ideas, please let me know!
> > I am able to fix the problem. My first assumption was that the
> > branching of SIMD code caused that problem, so I moved SIMD code to
> > the CopyReadLineTextSIMDHelper() function. Then I moved this
> > CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
> > that we won't have any branching in the non-SIMD (scalar) code path.
> > This didn't solve the problem and then I realized that even though I
> > disable SIMD code path with 'if (false)', there is still regression
> > but if I comment all of the 'if (cstate->simd_enabled)' branch, then
> > there is no regression at all.
> >
> > To find out more, I compared assembly outputs of both and found out
> > the possible reason. What I understood is that the compiler can't
> > promote a variable to register, instead these variables live in the
> > stack; which is slower. Please see the two different assembly outputs:
> >
> > Slow code:
> >
> > c = copy_input_buf[input_buf_ptr++];
> > db0: 48 8b 55 b8 mov -0x48(%rbp),%rdx
> > db4: 48 63 c6 movslq %esi,%rax
> > db7: 44 8d 66 01 lea 0x1(%rsi),%r12d
> > dbb: 44 89 65 cc mov %r12d,-0x34(%rbp)
> > dbf: 0f be 14 02 movsbl (%rdx,%rax,1),%edx
> >
> > Fast code:
> >
> > c = copy_input_buf[input_buf_ptr++];
> > d80: 49 63 c4 movslq %r12d,%rax
> > d83: 45 8d 5c 24 01 lea 0x1(%r12),%r11d
> > d88: 41 0f be 04 06 movsbl (%r14,%rax,1),%eax
> >
> > And the reason for that is sending the address of input_buf_ptr to a
> > CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
> > this:
> >
> > int temp_input_buf_ptr = input_buf_ptr;
> > CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);
> >
> > Then there is no regression. However, I am still not completely sure
> > if that is the same problem in the v10, I am planning to spend more
> > time debugging this.
> >
> >> A couple of random ideas:
> >>
> >> * Additional inlining for callers. I looked around a little bit and
> didn't
> >> see any great candidates, so I don't have much faith in this, but maybe
> >> you'll see something I don't.
> > I agree with you. CopyReadLineText() is already quite a big function.
> >
> >> * Disable SIMD if we are consistently getting small rows. That won't
> help
> >> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help
> with
> >> the regression for narrow rows described elsewhere.
> > I implemented this, two consecutive small rows disables SIMD.
> >
> >> * Surround the variable initializations with "if (simd_enabled)".
> >> Presumably compilers are smart enough to remove those in the non-SIMD
> paths
> >> already, but it could be worth a try.
> > Done.
> >
> >> * Add simd_enabled function parameter to CopyReadLine(),
> >> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do
> the
> >> bool literal trick in CopyFrom{Text,CSV}OneRow(). That could encourage
> the
> >> compiler to do some additional optimizations to reduce branching.
> > I think we don't need this. At least the implementation with
> > CopyReadLineTextSIMDHelper() doesn't need this since branching will be
> > at the top and it will be once per line.
> >
> > I think v11 looks better compared to v10. I liked the
> > CopyReadLineTextSIMDHelper() helper function. I also liked it being at
> > the top of CopyReadLineText(), not being in the scalar path. This
> > gives us more optimization options without affecting the scalar path.
> >
> > Here are the new benchmark results, I benchmarked the changes with
> > both -O2 and -O3 and also both with and without 'changing
> > default_toast_compression to lz4' commit (65def42b1d5). Benchmark
> > results show that there is no regression and the performance
> > improvement is much bigger with 65def42b1d5, it is close to 2x for
> > text format and more than 2x for the csv format.
>
>
> I spent some time exploring different ideas for improving this, but
> found none that didn't cause regression in some cases, so good to go
> from my POV.
>
>
> cheers
>
>
> andrew
>
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

--
-- Manni Wood EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2026-03-06 17:08:15 Re: Rework SLRU I/O errors handle
Previous Message Tom Lane 2026-03-06 16:53:16 Re: Allow specifying NULL default in pg_proc.dat for "any" arguments