Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
To: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-02-20 00:09:27
Message-ID: CAKWEB6pq7C0Wv1wT9Y1_c_1fn-+cR8pb210Pj3w2FcEOmNGxbQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 19, 2026 at 4:37 PM KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:

> Hello,
>
> I ran some long benchmarks on this, and I got stable results across
> multiple runs (few milliseconds difference)
>
> This is on an Intel I7-1255U CPU with:
> sudo cpupower frequency-set --governor=performance
> sudo cpupower idle-set -D 0
> echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
>
> WIDE (500k rows)
>
> TXT | none
> Master avg: 22,183 ms
> New avg: 20,435 ms
> Improvement: -7.88%
>
> CSV | none
> Master avg: 26,737 ms
> New avg: 24,625 ms
> Improvement: -7.90%
>
> TXT | escape
> Master avg: 26,720 ms
> New avg: 23,658 ms
> Improvement: -11.46%
>
> CSV | quote
> Master avg: 35,961 ms
> New avg: 33,317 ms
> Improvement: -7.35%
>
> --------------------------------------
>
> NARROW (1.5M rows)
>
> TXT | none
> Master avg: 2,220 ms
> New avg: 2,125 ms
> Improvement: -4.28%
>
> CSV | none
> Master avg: 2,330 ms
> New avg: 2,145 ms
> Improvement: -7.92%
>
> TXT | escape
> Master avg: 2,425 ms
> New avg: 2,187 ms
> Improvement: -9.79%
>
> CSV | quote
> Master avg: 2,272 ms
> New avg: 2,253 ms
> Improvement: -0.85%
>
> No regressions as expected, overall this looks good.
>
> Regards,
>
> Ayoub
>
> On Thu, Feb 19, 2026 at 10:01 AM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> wrote:
>
>> Hi,
>>
>> On Thu, 19 Feb 2026 at 07:02, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
>> wrote:
>> >
>> > I took some time tonight to apply v8 to the latest master (759b03b2) on
>> my x86 tower and arm raspberry pi 5.
>> >
>> > Here are the results, using both narrow columns and the wider columns
>> we've been using througout:
>> >
>> > x86 master NARROW
>> > TXT : 2587.642000 ms
>> > CSV : 2621.759000 ms
>> > TXT with 1/3 escapes: 2707.933500 ms
>> > CSV with 1/3 quotes: 3254.896500 ms
>> >
>> > x86 v8 NARROW
>> > TXT : 2488.655250 ms 3.825365% improvement
>> > CSV : 2628.818000 ms -0.269247% regression
>> > TXT with 1/3 escapes: 2615.522000 ms 3.412621% improvement
>> > CSV with 1/3 quotes: 3446.368000 ms -5.882568% regression
>> >
>> > x86 master WIDE
>> > TXT : 30583.229500 ms
>> > CSV : 35054.533500 ms
>> > TXT with 1/3 escapes: 32767.421500 ms
>> > CSV with 1/3 quotes: 44214.163500 ms
>> >
>> > x86 v8 WIDE
>> > TXT : 26527.494250 ms 13.261305% improvement
>> > CSV : 33364.443750 ms 4.821316% improvement
>> > TXT with 1/3 escapes: 29320.648000 ms 10.518904% improvement
>> > CSV with 1/3 quotes: 42334.074750 ms 4.252232% improvement
>> >
>> >
>> >
>> > arm master NARROW
>> > TXT : 1999.401000 ms
>> > CSV : 2081.610750 ms
>> > TXT with 1/3 escapes: 2053.230250 ms
>> > CSV with 1/3 quotes: 2431.608750 ms
>> >
>> > arm v8 NARROW
>> > TXT : 1981.663750 ms 0.887128% improvement
>> > CSV : 2023.892500 ms 2.772769% improvement
>> > TXT with 1/3 escapes: 2004.215250 ms 2.387214% improvement
>> > CSV with 1/3 quotes: 2616.872750 ms -7.618989% regression
>> >
>> > arm master WIDE
>> > TXT : 9120.731750 ms
>> > CSV : 11114.478250 ms
>> > TXT with 1/3 escapes: 10338.124500 ms
>> > CSV with 1/3 quotes: 13404.430250 ms
>> >
>> > arm v8 WIDE
>> > TXT : 8430.090750 ms 7.572210% improvement
>> > CSV : 10115.135500 ms 8.991360% improvement
>> > TXT with 1/3 escapes: 9624.383500 ms 6.903970% improvement
>> > CSV with 1/3 quotes: 12331.714000 ms 8.002699% improvement
>>
>> Thank you for the results, they are interesting. I didn't expect to
>> see any regression for this benchmark. Also, I would expect the
>> non-special character cases and the 1/3 special character cases to
>> perform similarly, since we are not using SIMD for this benchmark.
>>
>> I noticed that the timings in your narrow benchmark (both x86 and ARM)
>> are quite short. Would it be possible to extend the test so that the
>> total runtime is closer to ~10,000 ms? That might give us more stable
>> results.
>>
>> Here is my benchmark with using your script:
>>
>> WIDE: Total 500000 lines and each line is 4096 bytes.
>> NARROW: Total 1500000 lines and each line is 2-4 bytes (`"A""A"` and
>> `A\\A`).
>>
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | WIDE | TXT None | TXT 1/3 | CSV None | CSV 1/3
>> |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | master | 10512 | 11133 | 12241 | 14321
>> |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | patched | 10000 (-%4.8) | 10804 (-%2.9) | 11571 (-%5.4) | 14008
>> (-%2.18) |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | | | | |
>> |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | NARROW | | | |
>> |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | master | 9702 | 9745 | 9784 | 10149
>> |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | patched | 9344 (-%3.6) | 9477 (-%2.7) | 9439 (-%3.5) | 9751 (-%3.9)
>> |
>>
>> +---------+---------------+---------------+---------------+----------------+
>>
>> The results look promising to me.
>>
>> --
>> Regards,
>> Nazir Bilal Yavuz
>> Microsoft
>>
>
Hello!

Thanks for running benchmarks, Ayoub.

Nazir, I ran my benchmarks with more rows this time --- as many rows as
would fit on my test computers without exhausting their RAM disks. That
seems to have brought things more into line with what Ayoub saw. I did get
some small regressions, but I suspect those are not a big deal. (For
instance, on both machines I also noticed the occasional "truncate table"
would take longer than the others, despite my scripts' best efforts to
steady a CPU core and pin postmaster and children to that core.)

x86 WIDE master 500,000 rows
TXT : 30602.244000 ms
CSV : 35062.451250 ms
TXT with 1/3 escapes: 32704.250250 ms
CSV with 1/3 quotes: 44128.072500 ms

x86 WIDE v8 500,000 rows
TXT : 26611.953250 ms 13.039210% improvement
CSV : 33366.184000 ms 4.837846% improvement
TXT with 1/3 escapes: 29251.310000 ms 10.558078% improvement
CSV with 1/3 quotes: 42368.421000 ms 3.987601% improvement

x86 NARROW master 50mil rows
TXT : 25898.004000 ms
CSV : 27212.684500 ms
TXT with 1/3 escapes: 29189.518250 ms
CSV with 1/3 quotes: 33222.510250 ms

x86 NARROW v8 50mil rows
TXT : 26368.765000 ms -1.817750% regression
CSV : 26711.122250 ms 1.843119% improvement
TXT with 1/3 escapes: 28081.150750 ms 3.797142% improvement
CSV with 1/3 quotes: 32851.963500 ms 1.115348% improvement

arm WIDE master 250,000 rows
TXT : 11392.462750 ms
CSV : 13887.576500 ms
TXT with 1/3 escapes: 12908.560750 ms
CSV with 1/3 quotes: 16699.337000 ms

arm WIDE v8 250,000 rows
TXT : 10524.567750 ms 7.618151% improvement
CSV : 12621.211250 ms 9.118691% improvement
TXT with 1/3 escapes: 12017.030250 ms 6.906506% improvement
CSV with 1/3 quotes: 15428.020500 ms 7.612976% improvement

arm NARROW master 25mil rows
TXT : 10030.274000 ms
CSV : 10245.238750 ms
TXT with 1/3 escapes: 10345.224500 ms
CSV with 1/3 quotes: 12186.313250 ms

arm NARROW v8 25mil rows
TXT : 10197.386500 ms -1.666081% regression
CSV : 10257.918750 ms -0.123765% regression
TXT with 1/3 escapes: 10084.978500 ms 2.515615% improvement
CSV with 1/3 quotes: 12064.215000 ms 1.001929% improvement

--
-- Manni Wood EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2026-02-20 00:16:45 call for applications: hacker mentoring 2026
Previous Message Tom Lane 2026-02-19 23:55:12 Re: AIX support