Quick Links

Re: Speed up COPY FROM text/CSV parsing using SIMD

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
Cc:	Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date:	2025-08-21 15:47:30
Message-ID:	8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2025-08-19 Tu 10:14 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Tue, 19 Aug 2025 at 15:33, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
>> I am able to reproduce the regression you mentioned but both
>> regressions are %20 on my end. I found that (by experimenting) SIMD
>> causes a regression if it advances less than 5 characters.
>>
>> So, I implemented a small heuristic. It works like that:
>>
>> - If advance < 5 -> insert a sleep penalty (n cycles).
> 'sleep' might be a poor word choice here. I meant skipping SIMD for n
> number of times.
>

I was thinking a bit about that this morning. I wonder if it might be
better instead of having a constantly applied heuristic like this, it
might be better to do a little extra accounting in the first, say, 1000
lines of an input file, and if less than some portion of the input is
found to be special characters then switch to the SIMD code. What that
portion should be would need to be determined by some experimentation
with a variety of typical workloads, but given your findings 20% seems
like a good starting point.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Re: Speed up COPY FROM text/CSV parsing using SIMD at 2025-08-19 14:14:54 from Nazir Bilal Yavuz

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ranier Vilela	2025-08-21 16:17:56	Re: Weird error message from Postgres 18
Previous Message	Nathan Bossart	2025-08-21 15:37:10	Re: Don't treat virtual generated columns as missing statistics in vacuumdb --missing-stats-only