From: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
---|---|
To: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
Cc: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
Date: | 2025-08-19 12:33:38 |
Message-ID: | CAN55FZ1J+6eM=F5GreWEBMJcNV_gifYyYY1b6xpYzun=nWPhMQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
>> Thanks for running that benchmark! Would you mind sharing a reproducer
>> for the regression you observed?
>
> Of course, I attached the sql to generate the text and csv test files.
> If having a 1/3 of line length of special characters can be an exaggeration, something lower might still reproduce some regressions of course for the same idea.
Thank you so much!
I am able to reproduce the regression you mentioned but both
regressions are %20 on my end. I found that (by experimenting) SIMD
causes a regression if it advances less than 5 characters.
So, I implemented a small heuristic. It works like that:
- If advance < 5 -> insert a sleep penalty (n cycles).
- Each time advance < 5, n is doubled.
- Each time advance ≥ 5, n is halved.
I am sharing a POC patch to show heuristic, it can be applied on top
of v1-0001. Heuristic version has the same performance improvements
with the v1-0001 but the regression is %5 instead of %20 compared to
the master.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Attachment | Content-Type | Size |
---|---|---|
COPY-SIMD-add-heuristic-to-avoid-regression-on-sm.txt | text/plain | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Yura Sokolov | 2025-08-19 12:50:44 | Re: VM corruption on standby |
Previous Message | Jelte Fennema-Nio | 2025-08-19 12:25:04 | Re: Proposal: Extending the PostgreSQL Protocol with Command Metadata |