Quick Links

Re: Speed up COPY TO text/CSV parsing using SIMD

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Neil Conway <neil(dot)conway(at)gmail(dot)com>, Manni Wood <manni(dot)wood(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject:	Re: Speed up COPY TO text/CSV parsing using SIMD
Date:	2026-03-26 21:23:48
Message-ID:	acWj5FntidHJ9nVP@nathan
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times for
> one COPY TO command), i only found 3% regression for this case, sometimes
> 2%.

I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads. Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes. The extra branching for each attribute
might not be something we can just ignore.

> For cases where we do a false commitment on SIMD because we read a binary
> size >= sizeof(Vector8), which i found very niche too, the short circuit to
> scalar each time is even more negligible (the above CSV JSON case is the
> absolute worst case).

That's good to hear.

--
nathan

In response to

Re: Speed up COPY TO text/CSV parsing using SIMD at 2026-03-18 02:29:32 from KAZAR Ayoub

Responses

Re: Speed up COPY TO text/CSV parsing using SIMD at 2026-03-27 18:48:38 from KAZAR Ayoub

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2026-03-26 21:29:47	Re: another autovacuum scheduling thread
Previous Message	Nathan Bossart	2026-03-26 21:09:23	Re: Speed up COPY TO text/CSV parsing using SIMD