| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
| Cc: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2026-02-11 22:39:43 |
| Message-ID: | aY0FL4rXUl6ykn-a@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote:
> I am sharing a v6 which implements (1). My benchmark results show
> almost no difference for the special-character cases and a nice
> improvement for the no-special-character cases.
Thanks!
> + /* Initialize SIMD variables */
> + cstate->simd_enabled = false;
> + cstate->simd_initialized = false;
> + /* Initialize SIMD on the first read */
> + if (unlikely(!cstate->simd_initialized))
> + {
> + cstate->simd_initialized = true;
> + cstate->simd_enabled = true;
> + }
Why do we do this initialization in CopyReadLine() as opposed to setting
simd_enabled to true when initializing cstate in BeginCopyFrom()? If we
can initialize it in BeginCopyFrom, we could probably remove
simd_initialized.
> + if (cstate->simd_enabled)
> + result = CopyReadLineText(cstate, is_csv, true);
> + else
> + result = CopyReadLineText(cstate, is_csv, false);
I know we discussed this upthread, but I'd like to take a closer look at
this to see whether/why it makes such a big difference. It's a bit awkward
that CopyReadLineText() needs to manage both its local simd_enabled and
cstate->simd_enabled.
+ /* Load a chunk of data into a vector register */
+ vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]);
As mentioned upthread [0], I think it's worth testing whether processing
multiple vectors worth of data in each loop iteration is worthwhile.
[0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan
--
nathan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-02-12 00:37:54 | Odd usage of errmsg_internal in bufmgr.c |
| Previous Message | Paul A Jungwirth | 2026-02-11 21:25:21 | Re: SQL:2011 Application Time Update & Delete |