Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>
Cc: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Mark Wong <markwkm(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-01-09 14:20:54
Message-ID: CAN55FZ2-Er-+54OA6oJ49me7wA=GkAAvfR2XO5U_AJt-JxM8+A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Firstly, thank you for all of the benchmarks!

On Tue, 6 Jan 2026 at 23:05, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com> wrote:
>
> Hello, Nazir, I tried your suggested cpupower commands as well as disabling turbo, and my results are indeed more uniform. (see attached screenshot of my spreadsheet).

I am glad that it helped!

> This time, I ran the tests on my Tower PC instead of on my laptop.
>
> I also followed Mark Wong's advice and used the taskset command to pin my postgres postmaster (and all of its children) to a single cpu core.

That is nice advice, I should apply this. Thank you for sharing it.

> So when I start postgres, I do this to pin it to core 27:
>
> ${PGHOME}/bin/pg_ctl -D ${PGHOME}/data -l ${PGHOME}/logfile.txt start
> PGPID=$(head -1 ${PGHOME}/data/postmaster.pid)
> taskset --cpu-list -p 27 ${PGPID}
>
>
> My results seem similar to yours:

It is nice to see that we get similar results.

> master: Nazir 85ddcc2f4c | Manni 877ae5db
>
> text, no special: 102294 | 302651
> text, 1/3 special: 108946 | 326208
> csv, no special: 121831 | 348930
> csv, 1/3 special: 140063 | 439786
>
> v3
>
> text, no special: 88890 (13.1% speedup) | 227874 (24.7% speedup)
> text, 1/3 special: 110463 (1.4% regression) | 322637 (1.1% speedup)
> csv, no special: 89781 (26.3% speedup) | 226525 (35.1% speedup)
> csv, 1/3 special: 147094 (5.0% regression) | 461501 (4.9% regression)
>
> v4.2
>
> text, no special: 87785 (14.2% speedup) | 225702 (25.4% speedup)
> text, 1/3 special: 127008 (16.6% regression) | 343480 (5.3% regression)
> csv, no special: 88093 (27.7% speedup) | 226633 (35.0% speedup)
> csv, 1/3 special: 164487 (17.4% regression) | 510954 (16.2% regression)
>
> It would seem that both your results and mine show a more serious worst-case regression for the v4.2 patches than for the v3 patches. It seems also that the speedups for v4.2 and v3 are similar.

Yes, you are right. Also, the regression on the CSV is worse than
TEXT, do you have any idea why?

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2026-01-09 14:21:45 Re: Speed up COPY FROM text/CSV parsing using SIMD
Previous Message Greg Sabino Mullane 2026-01-09 14:15:48 Re: POC: Carefully exposing information without authentication