| From: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date: | 2026-03-11 18:49:22 |
| Message-ID: | CAN55FZ3gdK8dGrEo0M6KFW97OaF8TUbjO_dFoxQKi63davE-jA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Wed, 11 Mar 2026 at 21:09, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Wed, Mar 11, 2026 at 02:36:46PM +0300, Nazir Bilal Yavuz wrote:
> > 0002 has an attempt to remove some branches from SIMD code but since
> > it is kind of functional change, I wanted to attach that as another
> > patch. I think we can apply some parts of this, if not all.
>
> Could you describe what this is doing and what the performance impact is?
SIMD code check these characters:
csv mode: nl, cr, quote and possibly escape.
text mode: nl, cr and bs.
v12 checks them like that:
if (is_csv)
{
match = vector8_or(vector8_eq(chunk, nl),
vector8_eq(chunk, cr));
match = vector8_or(match, vector8_eq(chunk, quote));
if (unique_escapec)
match = vector8_or(match, vector8_eq(chunk, escape));
}
else
{
match = vector8_or(vector8_eq(chunk, nl),
vector8_eq(chunk, cr));
match = vector8_or(match, vector8_eq(chunk, bs));
}
But actually we know that we will definitely check nl, cr and one of
the quote or bs characters in the code. So, we can introduce a new
variable named bs_or_quote, it will be equal to bs if the mode is text
and it will be equal to quote if the mode is csv. Then, we can remove
the 'if (is_csv)' check and only check for escape ('if
(unique_escapec)'). Now code will look like that:
match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
match = vector8_or(match, vector8_eq(chunk, bs_or_quote));
if (unique_escapec)
match = vector8_or(match, vector8_eq(chunk, escape));
That is what v13-0002 does. I saw 1%-2% speedups with this change and
there was no regression.
Regardless of introducing the bs_or_quote variable, we can move 'match
= vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));' outside
of the if checks, though.
--
Regards,
Nazir Bilal Yavuz
Microsoft
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Melanie Plageman | 2026-03-11 18:54:29 | Re: Unlogged rel fake lsn vs GetVictimBuffer() |
| Previous Message | Bertrand Drouvot | 2026-03-11 18:29:47 | Re: Defend against -ffast-math in meson builds |