Quick Links

Re: Speed up COPY FROM text/CSV parsing using SIMD

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc:	KAZAR Ayoub <ma_kazar(at)esi(dot)dz>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date:	2025-10-29 22:22:46
Message-ID:	5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2025-10-22 We 3:24 PM, Nathan Bossart wrote:
> On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote:
>> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>>> I wonder if we could mitigate the regression further by spacing out the
>>> checks a bit more. It could be worth comparing a variety of values to
>>> identify what works best with the test data.
>> Do you mean that instead of doubling the SIMD sleep, we should
>> multiply it by 3 (or another factor)? Or are you referring to
>> increasing the maximum sleep from 1024? Or possibly both?
> I'm not sure of the precise details, but the main thrust of my suggestion
> is to assume that whatever sampling you do to determine whether to use SIMD
> is good for a larger chunk of data. That is, if you are sampling 1K lines
> and then using the result to choose whether to use SIMD for the next 100K
> lines, we could instead bump the latter number to 1M lines (or something).
> That way we minimize the regression for relatively uniform data sets while
> retaining some ability to adapt in case things change halfway through a
> large table.
>

I'd be ok with numbers like this, although I suspect the numbers of
cases where we see shape shifts like this in the middle of a data set
would be vanishingly small.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Re: Speed up COPY FROM text/CSV parsing using SIMD at 2025-10-22 19:24:59 from Nathan Bossart

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2025-10-29 22:51:42	Re: Use BumpContext contexts for TupleHashTables' tablecxt
Previous Message	Joe Conway	2025-10-29 22:11:26	Re: contrib/sepgsql regression tests have been broken for months