Re: Speed up COPY FROM text/CSV parsing using SIMD

From: KAZAR Ayoub <ma_kazar(at)esi(dot)dz>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: 2025-08-21 19:36:42
Message-ID: CA+K2RukEWfNAp821Fy1LYWCoE_fOKMU8efsP2VLb5ZM8OEETWA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
> >> Thanks for running that benchmark! Would you mind sharing a reproducer
> >> for the regression you observed?
> >
> > Of course, I attached the sql to generate the text and csv test files.
> > If having a 1/3 of line length of special characters can be an
> exaggeration, something lower might still reproduce some regressions of
> course for the same idea.
>
> Thank you so much!
>
> I am able to reproduce the regression you mentioned but both
> regressions are %20 on my end. I found that (by experimenting) SIMD
> causes a regression if it advances less than 5 characters.
>
> So, I implemented a small heuristic. It works like that:
>
> - If advance < 5 -> insert a sleep penalty (n cycles).
> - Each time advance < 5, n is doubled.
> - Each time advance ≥ 5, n is halved.
>
> I am sharing a POC patch to show heuristic, it can be applied on top
> of v1-0001. Heuristic version has the same performance improvements
> with the v1-0001 but the regression is %5 instead of %20 compared to
> the master.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft

Yes this is good, i'm also getting about 5% regression only now.

Regards,
Ayoub Kazar

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2025-08-21 22:06:13 Re: Adding REPACK [concurrently]
Previous Message Andres Freund 2025-08-21 18:16:28 Re: Adding REPACK [concurrently]