Quick Links

Re: Speed up COPY FROM text/CSV parsing using SIMD

From:	KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
To:	Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc:	Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Speed up COPY FROM text/CSV parsing using SIMD
Date:	2025-08-21 19:36:42
Message-ID:	CA+K2RukEWfNAp821Fy1LYWCoE_fOKMU8efsP2VLb5ZM8OEETWA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
> >> Thanks for running that benchmark! Would you mind sharing a reproducer
> >> for the regression you observed?
> >
> > Of course, I attached the sql to generate the text and csv test files.
> > If having a 1/3 of line length of special characters can be an
> exaggeration, something lower might still reproduce some regressions of
> course for the same idea.
>
> Thank you so much!
>
> I am able to reproduce the regression you mentioned but both
> regressions are %20 on my end. I found that (by experimenting) SIMD
> causes a regression if it advances less than 5 characters.
>
> So, I implemented a small heuristic. It works like that:
>
> - If advance < 5 -> insert a sleep penalty (n cycles).
> - Each time advance < 5, n is doubled.
> - Each time advance ≥ 5, n is halved.
>
> I am sharing a POC patch to show heuristic, it can be applied on top
> of v1-0001. Heuristic version has the same performance improvements
> with the v1-0001 but the regression is %5 instead of %20 compared to
> the master.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft

Yes this is good, i'm also getting about 5% regression only now.

Regards,
Ayoub Kazar

In response to

Re: Speed up COPY FROM text/CSV parsing using SIMD at 2025-08-19 12:33:38 from Nazir Bilal Yavuz

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Treat	2025-08-21 22:06:13	Re: Adding REPACK [concurrently]
Previous Message	Andres Freund	2025-08-21 18:16:28	Re: Adding REPACK [concurrently]