Re: Speed up COPY FROM text/CSV parsing using SIMD

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
Cc: Neil Conway <neil(dot)conway(at)gmail(dot)com>, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2026-02-06 13:51:17
Message-ID: CAN55FZ0FRB2OD6-oEESLvgUT4bLZQVD72pAqUqzdw7Rx5cN0ig@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, 31 Jan 2026 at 19:21, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
>
> On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <neil(dot)conway(at)gmail(dot)com> wrote:
>>
>> * I'm curious if we'll see better performance on large inputs if we flush to `line_buf` periodically (e.g., at least every few thousand bytes or so). Otherwise we might see poor data cache behavior if large inputs with no control characters get evicted before we've copied them over. See the approach taken in escape_json_with_len() in utils/adt/json.c
>>
> So i gave this a try, attached is the small patch that has v3 + the suggestion added, here are the results with different threshold for line_buf refill:
>
> Execution time compared to master:
> Workloadv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none-16.5%-17.4%-14.3%-12.6%-13.6%-10.5%-16.3%
> text/esc+5.6%+11.1%+3.1%+7.6%+3.0%+4.9%+4.2%
> csv/none-31.0%-29.9%-26.7%-30.1%-27.9%-30.2%-29.6%
> csv/quote+0.2%-0.6%-0.4%-1.0%+0.1%+2.5%-1.0%
>
> L1d cache miss rates:
> WorkloadMasterv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none0.20%0.23%0.21%0.22%0.21%0.21%0.21%0.22%
> text/esc0.21%0.22%0.22%0.22%0.22%0.21%0.22%0.22%
> csv/none0.17%0.22%0.21%0.22%0.21%0.21%0.22%0.22%
> csv/quote0.18%0.22%0.19%0.20%0.20%0.19%0.20%0.20%
>
> On my laptop I have 32KB L1 cache per core.
> Results are super close, it is hard to see in the cache misses numbers but execution times are saying other things, doing the periodic filling of line_buf seems good to do.
> If Manni can rerun the benchmarks on these too, it would be nice to confirm this.

I looked at this change and had a couple of points.

We already have REFILL_LINEBUF at the start of the for loop in the
CopyReadLineText() function (let’s call this refill #1). This refills
when the input_buf_ptr >= copy_buf_len check is true. On my end,
copy_buf_len stays at 8191 until the end of the input, and then it
becomes the remaining amount. So when I set LINE_BUF_FLUSH_AFTER to
8192, the REFILL_LINEBUF you added shouldn’t be called; instead,
refill #1 should be triggered.

I verified this manually by adding some logging, and the results seem
to confirm this behavior. Based on that, there shouldn’t be a
performance difference when LINE_BUF_FLUSH_AFTER >= 8k.

Could you please take a look and confirm whether you see the same behavior?

Also, I noticed that json.c uses ESCAPE_JSON_FLUSH_AFTER set to 512,
so it might be worth trying smaller values here as well.

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2026-02-06 13:59:32 Re: Fix pg_stat_get_backend_wait_event() for aux processes
Previous Message Daniel Gustafsson 2026-02-06 13:18:07 Re: IPC::Run::time[r|out] vs our TAP tests