From: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
---|---|
To: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
Cc: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
Date: | 2025-08-21 19:36:42 |
Message-ID: | CA+K2RukEWfNAp821Fy1LYWCoE_fOKMU8efsP2VLb5ZM8OEETWA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
> >> Thanks for running that benchmark! Would you mind sharing a reproducer
> >> for the regression you observed?
> >
> > Of course, I attached the sql to generate the text and csv test files.
> > If having a 1/3 of line length of special characters can be an
> exaggeration, something lower might still reproduce some regressions of
> course for the same idea.
>
> Thank you so much!
>
> I am able to reproduce the regression you mentioned but both
> regressions are %20 on my end. I found that (by experimenting) SIMD
> causes a regression if it advances less than 5 characters.
>
> So, I implemented a small heuristic. It works like that:
>
> - If advance < 5 -> insert a sleep penalty (n cycles).
> - Each time advance < 5, n is doubled.
> - Each time advance ≥ 5, n is halved.
>
> I am sharing a POC patch to show heuristic, it can be applied on top
> of v1-0001. Heuristic version has the same performance improvements
> with the v1-0001 but the regression is %5 instead of %20 compared to
> the master.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
Yes this is good, i'm also getting about 5% regression only now.
Regards,
Ayoub Kazar
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Treat | 2025-08-21 22:06:13 | Re: Adding REPACK [concurrently] |
Previous Message | Andres Freund | 2025-08-21 18:16:28 | Re: Adding REPACK [concurrently] |