From: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
---|---|
To: | KAZAR Ayoub <ma_kazar(at)esi(dot)dz> |
Cc: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Speed up COPY FROM text/CSV parsing using SIMD |
Date: | 2025-08-14 10:29:35 |
Message-ID: | CAN55FZ0houfWHn8_MEEefhprZvc33jr07GrBYo+Bp2yw=TVnKA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub <ma_kazar(at)esi(dot)dz> wrote:
>
> Following Nazir's findings about 4096 bytes being the performant line length, I did more benchmarks from my side on both TEXT and CSV formats with two different cases of normal data (no special characters) and data with many special characters.
>
> Results are con good as expected and similar to previous benchmarks
> ~30.9% faster copy in TEXT format
> ~32.4% faster copy in CSV format
> 20%-30% reduces cycles per instructions
>
> In the case of doing a lot of special characters in the lines (e.g., tables with large numbers of columns maybe), we obviously expect regressions here because of the overhead of many fallbacks to scalar processing.
> Results for a 1/3 of line length of special characters:
> ~43.9% slower copy in TEXT format
> ~16.7% slower copy in CSV format
> So for even less occurrences of special characters or wider distance between there might still be some regressions in this case, a non-significant case maybe, but can be treated in other patches if we consider to not use SIMD path sometimes.
>
> I hope this helps more and confirms the patch.
Thanks for running that benchmark! Would you mind sharing a reproducer
for the regression you observed?
--
Regards,
Nazir Bilal Yavuz
Microsoft
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2025-08-14 10:34:39 | Re: Conflict detection for update_deleted in logical replication |
Previous Message | Bertrand Drouvot | 2025-08-14 10:20:56 | Re: Report reorder buffer size |